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SECTION  1 


INTRODUCTION 


1 . 1  Purpose 

This  report  examines  the  hardware  implementation  of  an  efficient 
decoding  algorithm  for  the  Reed-Solomon  class  of  symbol-error-correct¬ 
ing  codes.  The  algorithm,  described  in  Volume  1  of  this  report, 
offers  major  simplifications  relative  to  the  more  conventional  BCH 
(Bose-Chaudhur i-Hoequenghc-m)  decoding  algorithms  Cl].  The  simplifica¬ 
tions  result  both  from  the  reduced  complexity  of  the  algorithm  and 
from  the  opportunity  to  apply  fast  computational  techniques  for  its 
implementation. 

1. 2  Background 

Error-correcting  codes  are  useful  to  correct  message  errors 
which  are  caused  by  interference,  additive  random  noise,  and  other 
channel  disturbances.  Error-correction  techniques  are  implemented  in 
a  two-step  process.  At  the  message  source,  redundant  symbols  are 
added  to  the  original  message  according  to  a  predetermined  strategy 
(encoding).  The  encoded  message  is  transmitted  and  errors  may  be  in¬ 
troduced.  At  the  message  destination,  the  original  message  is  recov¬ 
ered  from  the  noisy  received  signal  (decoding),  aided  by  prior  know¬ 
ledge  of  the  code.  Message  transmission  using  error-correcting  codes 
represents  an  effective  means  of  obtaining  low  error  probability  in 
the  decoded  message. 

Previous  work  included  both  the  analysis  of  error-correcting 
codes  and  the  examination  of  their  effective  implementation.  Error- 
correcting  codes,  used  with  spread-spectrum  modulation  techniques, 
were  shown  to  be  beneficial  in  the  design  of  jam-resistant  communica- 


II 


t ion  systems.  Li  has  also  been  suggested  that  error-correcting 
codes,  when  incorporated  within  the  internal  busing  structure  of  a 
system  (or  device),  can  impact  favorably  on  that  system's  (device's) 
reliability  [  2 ) . 

Our  examination  of  error-correcting  codes  has  led  us  to  concen¬ 
trate  on  the  Reed-Solomon  class  of  generalized  BCH  symbol  error- 
correcting  codes.  The  distance  properties  of  this  class  of  algebraic 
block  codes  assure  correction  of  both  random  isolated  errors  and 
random  burst  errors.  While  the  encoding  process  for  Reed-Solomon 
codes  is  relatively  simple,  the  decoding  process  is  complex  and 
generally  requires  a  dedicated  processor. 

We  have  experimented  with  direct  decoding  of  short  block  length 
Reed-Solomon  codes  by  implementing  a  code-table  search  algorithm 
under  microprocessor  control  |1J.  Further  analysis  of  Reed-Solomon 
codes  has  led  to  the  development  of  a  transform-based  decoder  that 
offers  major  simplifications  relative  to  the  more  conventional 
BCH  decoders  [4  J . 

The  decoding  algorithm  imposes  a  high  degree  of  circuit  complex¬ 
ity  on  its  associated  hardware  implementation.  Analogies  with  conven¬ 
tional  linear  digital  signal  processing  functions  aid  in  partitioning 
the  decoding  hardware  into  sections  that  perform  finite-field  oper¬ 
ations  (e.g.,  field-element  multiplication,  division,  and  inversion). 
These  sections  can  be  used  to  develop  functional  LSI  hardware  which 
performs  a  variety  of  finite-field  data  processing  functions.  If 
the  unique  properties  of  finite  structures  are  exploited  (e.g., 
elimination  of  round-off  errors,  multiplication  by  adding  "logarithms'1), 
the  development  of  these  hardware  capabilities  may  lead  to  the  use  of 
finite-field  computational  methods  for  other  linear  signal  processing 
applications. 
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1 . 3  Scope 

This  is  the  second  volume  of  a  report  concerned  with  transform 
decoding  of  Reed-Solomon  codes.  The  first  volume  discussed  the  de¬ 
coding  algorithm.  This  volume  concentrates  on  the  logic  design  and 
hardware  implementation  of  the  transform  decoding  algorithm.  It  be¬ 
gins  by  outlining  the  concepts  of  transform  coding  and  decoding  of 
Reed-Solomon  codes.  Section  II  is  primarily  an  overview  of  the  ma¬ 
terial  presented  in  Volume  I,  included  here  for  completeness.  V.Tiile 
reading  this  report  one  should  also  refer  to  Volume  I  of  this  TR  [ 4  J , 
which  contains  the  appropriate  frame  of  reference  for  the  present 
volume . 

In  Section  III,  an  architectural  design  of  a  Reed-Solomon  encoder 
and  decoder  is  presented.  The  processor  is  reconf igurable  to  accom¬ 
modate  a  large  number  of  different  code  parameters  for  both  maximum 
and  sub-maximum  length  codes  over  GF(2m),  the  symbol  fields  ranging 
from  four  to  eight  bits.  The  maximum-length  codeword  that  can  be 
processed  by  this  design  is  a  255-symbol  word,  with  each  symbol  re¬ 
presented  by  eight  bits.  (This  unit  will  be  called  the  (255, k)  en¬ 
coder  and  decoder.)  Included  within  Section  III  is  a  detailed  de¬ 
scription  of  the  coding  capabilities,  functional  partitioning,  pro¬ 
jected  hardware  complexity  and  expected  operational  characteristics 
of  the  (255, k)  transform  encoder  and  decoder. 

In  Section  IV,  a  description  of  the  logic  implementation  of  a 
reconf igurable  Reed-Solomon  TTL  breadboard  is  presented.  This 
encoder  and  decoder  breadboard  is  designed  to  be  electronically 
reconf igurable  to  accommodate  a  subset  of  the  codes  processed  by 
the  (255, k)  encoder  and  decoder  described  in  Section  III.  Although 
the  breadboard  is  not  large  enough  to  decode  all  of  the  codes  pro¬ 
cessed  by  the  (255, k)  decoder,  it  operates  over  most  of  the  required 
fields  and  it  effectively  demonstrates  the  reconfigurability  of  the 
decoder's  architecture.  The  encoder  and  decoder  breadboard  is  capable 
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of  processing  a  maximum- length  codeword  of  51  symbols,  each  symbol 
being  represented  by  eight  bits.  (The  breadboard  will  be  called  the 
(51, k)  encoder  and  decoder.)  Included  within  section  IV  is  a  detailed 
discussion  of  the  breadboard's  coding  capabilities,  functional  and 
physical  partitioning,  hardware  complexity  and  operational  character¬ 
istics  . 

Appendix  A  presents  a  detailed  discussion  of  binary-extension 
field  multiplier  structures  that  are  used  in  the  Reed-Solomon  error- 
correcting  encoder  and  decoder.  The  transform  encoding  and  decoding 
of  a  (31,15)  Reed-Solomon  code,  constructed  over  GF(2~*),  is  presented 
by  means  of  an  example  in  appendix  B  to  aid  the  reader  in  tracing 
the  flow  of  the  decoding  algorithm. 
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SECTION  II 


TRANSFORM  ENCODING  AND  DECODING:  AN  OVERVIEW 

Reed-Solomon  codes  are  symbol  error-correcting  linear  block 
codes.  A  particular  (n,k)  Reed-Solomon  code,  constructed  over  the 
binary-extension  field  GF(2m),  has  a  block  length  of  n  symbols, 
where  k  symbols  (k<n)  represent  the  information.  Each  of  the  n 
symbols  within  the  codeword  can  be  represented  as  a  binary  m-tuple. 

Reed-Solomon  codes  are  maximum-distance  separable  linear  block 
codes.  These  are  (n,k)  codes  for  which  the  minimum  distance,  d^^, 
between  any  pair  of  codewords  is  the  maximum  value, 


d  .  =  n  -  k  +  1 

min 


(2-1) 


These  codes  can  correct  any  combination  of  t  errors  and  s  erasures 
provided  the  inequality 


2t  +  s  <  n  -  k 


(2-2) 


is  satisfied. 

In  order  to  discuss  the  structural  properties  of  Reed-Solomon 
codes  and  their  implementation,  it  is  convenient  to  regard  the  code¬ 
words  as  polynomials.  A  codeword  from  an  (n,k)  code,  constructed 
over  GF(2m),  is  an  n-tuple  with  each  symbol  represented  by  m  bits. 

Each  codeword  can  be  represented  by  a  polynomial  of  degree  n-1,  having 
coefficients  that  are  members  of  the  finite  field  of  2m  elements. 
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Such  a  polynomial  is  determined  uniquely  by  its  n  coefficients  or 
equivalently  by  its  values  at  any  n  distinct  points  of  the  field. 

A  codeword  of  block  length  n  may  be  specified  either  by  a  set  of  n 
values  or  by  the  polynomial  coefficients  interpolated  from  these 
values . 

2 . 1  Finite  Field  Transforms  Over  GF(2m):  A  Review 

Let  Sq,  a^,  ...  ,  an_^  be  elements  of  a  finite  field  GF(2m)  of 

multiplicative  order  2m-l.  Let  b  be  an  element  of  GF(2tn),  and  let 
,  ,  th  .  . 

b  be  an  n  root  of  unity.  Assuming  that  n  divides  or  is  equal 
to  2m-l,  the  linear  transformation 


A. 

J 


n-1 

£ 


i=0 


a1b 


ij 


(2-3) 


is  a  mapping  from  GF(2m)  onto  itself. 


For  any  integer  r. 


n-1 

£ 


i=0 


lr 


n,  r  =  0  mod  n 
0,  otherwise 


(2-4) 


Equation  (2-4)  can  be  used  to  verify  that  the  mapping  that  is  inverse 
to  equation  (2-3)  is  the  linear  transformation 


a.  =  n  ^  ^  A  b  ^  ;  i=0,  1,  ...,  n-1  (2-5) 

1  j=0  J 

where  n  ^n  =  1.  Equations  (2-3)  and  (2-5)  define  a  discrete  linear 
transform  pair  over  GF(2m),  where  the  operations  of  addition  and 
multiplication  are  defined  in  the  same  field.  Addition  of  two  field 
elements  from  GF(2m)  is  defined  as  the  bit-by-bit  modulo-2  addition 
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of  the  m-tuple  representation  of  the  field  elements.  Multiplication 
is  defined  in  terms  of  the  primitive  field  element  a.  GF(2m)  has 
multiplicative  order  2m-l  and  it  contains  an  element  a  of  the  same 
order.  The  non-zero  elements  of  the  field  can  be  written  as 


0  12 
a  „  a  ,  a  , 


. .  Multiplication  of  two  field  elements  is 


defined  as  the  addition  (modulo  2  -1)  of  the  indices  of  the  corre¬ 
sponding  field  elements 


r  s  (r+s) 
a  -  a  =  a 


The  sequence  a  ,  a.,  ...,  a  ,  of  elements  from  the  field  C.F(2  ) 
ol  n-l 

can  be  expressed  as  an  (n-l)th  degree  polynomial,  a(z),  where 


a(z)  =  a0  +  alZ  +  ...  +  a  z"'1  =  £  a.z1  (2-7) 

i=0 


The  forward  transform  of  a  sequence  a^,  a^,  a2>  •••»  an_i  can  be 
obtained  by  the  polynomial  evaluation  of  a(z)  at  the  n  distinct 
powers  of  the  transform’s  kernel,'  b^*,  b\  ...»  bn  \  such  that 


il —  J.  ,  .  . 

A,  =  a.b1^  =  a(b^)  ;  j=0,  1,  ...,  n-l 

1  i=0  1 

Similarly,  the  inverse  transform  is  obtained  by  interpolation 
of  the  polynomial  a(z)  from  its  n  known  values. 


.  .  =  ^  A.  b  J  =  A(b  ;  i=0,  1,  ...,  n-l 

1  j=0  J 


(2-9) 
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where  A(z)  -  AQ  +  +  .  .  .  +  A^z""1. 

2 . 2  Codeword  Generation  by  Discrete  Transformation 

The  encoding  of  Reed-Solomon  codes  can  be  defined  in  terms  of 
finite-field  transforms.  Let  a^,  a^,  ....  represent  a  sequence 

of  k  message  symbols,  with  each  symbol  represented  by  m  bits.  A 
length-n  message  sequence  can  be  formed  by  adjoining  n-k  consecutive 
zero-valued  symbols  to  the  original  length-k  message  sequence.  We 
regard  the  polynomial  a(z)  as  the  message  polynomial 


a(z)  =  a0  +  alZ  +  ...  +  a^z""1  (2-10) 

The  first  k  coefficients  are  the  k  message  symbols,  and  the  remaining 
n-k  coefficients  are  zero. 

A  codeword  for  a  Reed-Solomon  (n,k)  code,  constructed  over  0F(2m), 
can  be  generated  by  calculating  the  n-point  discrete  transform  of  the 
sequence  represented  by  a(z),  [4,5].  There  is  symmetry  associated 
with  the  transform  so  that  either  a  forward  or  an  inverse  transform 
may  be  used  to  encode.  The  only  requirement  is  that  the  reverse  of 
the  encoding  transform  be  calculated  for  decoding.  For  compatibility 
with  Volume  I,  this  review  will  use  a  forward  transform  for  encoding. 

A  codeword,  consisting  of  n  symbols,  is  constructed  by  calculating 
the  forward  transform  of  the  length-n  message  sequence,  as  in  equation 
(2-8).  The  forward  transform,  or  polynomial  evaluation,  can  be 
expressed  as  the  continued  product 

A.  =  a0  +  bJ(ai  +  ...  +  bj  (an_2  +  b^j)...)  (2-11) 
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Equivalently,  the  forward  transform  can  be  interpreted  as  the 
remainder  of  the  polynomial  division  a(z)/(z-b"^) 

a(z)  =  q(z)(z-b^)  +  (2-12) 

The  second  interpretation  may  be  represented  as  a  set  of  polynomial 
congruences  such  that 

A.  =  a(b^)  =  a(z)  mod  (z-b"i)  ;  .1  =  0,  1,  ...,  n-1  (2-13) 

An  equivalent  method  for  obtaining  A.  is  to  divide  a(z)  by  a 
set  of  small  degree  polynomials  containing  distinct  factors  of  the 
form  (z-b''),  and  then  to  evaluate  the  lesser-degree  residue  poly¬ 
nomials  at  the  appropriate  values  bJ  .  If  the  set  of  divisor  poly¬ 
nomials  is  defined  to  be  the  set  of  minimal  polynomials  of  the  non¬ 
zero  field  elements,  then  their  coefficients  are  restricted  to  the 
prime  field  GF(2).  In  this  case,  division  can  be  performed  using 
only  scalar  multiplication  by  tiie  elements  of  the  prime  field.  For 
codes  over  GF(2  ),  the  minimal  polynomials  have  coefficients  that 
are  either  one  or  zero  requiring  only  operations  in  GF(2)  for  poly¬ 
nomial  division.  In  Volume  I,  this  technique  of  computing  a  finite- 
field  transform  was  shown  to  be  a  "fast"  algorithm;  it  tends  to 
minimize  the  number  of  multiplications  in  GF(2m),  the  number  approach¬ 
ing  nlog2n. 


IP 


2 . 3  Reed-Solomon  Transform  Decoding 

Finite-field  transforms  can  be  applied  to  decode  Reed-Solomon 
codewords.  If  the  source  message  is  represented  by  the  polynomial 
expressed  in  equation  (2-10),  then  the  transmitted  codeword  is  repre¬ 
sented  by  the  polynomial  A(z)  =  Aq  +  A-^z  +  . , .  +  A^^z11  1  where  the 
coefficients  A^  are  determined  as  a(b^)  in  accordance  with  equation 
(2-8).  If  the  inverse  finite-field  transform,  equation  (2-5),  is 
applied  to  the  transmitted  codeword,  the  message  polynomial  a(z)  is 
obtained  and  the  original  k  message  symbols  are  recovered. 

Assume  that  an  error  sequence  represented  by  the  polynomial 

E(z)  =  E-.  +  E,z  +  ...  +  E  ,zn  ^  has  been  added  to  the  encoded  mes- 
0  1  n-1 

sage  A(z)  during  transmission.  In  order  for  the  received  word  to  be 
correctable,  E(z)  can  not  have  more  than  (n-k)/2  non-zero  coefficients 
their  values  and  locations  are  unknown.  The  received  sequence  is 
represented  by  the  polynomial  sum  R(z)  =  E(z)  +  A(z) .  The  inverse 
transform  of  the  received  sequence  is  the  polynomial  sum 
r(z)  =  e(z)  +  a(z),  where  e(z)  is  the  inverse  transform  of  the  error 
polynomial  E(z),  and  a(z)  is  the  original  length-n  message  polynomial. 
The  decoding  problem  is  to  determine  e(z)  from  the  transform  r(z)  of 
the  observed  sequence  R(z) . 

To  decode,  the  polynomial  r(z)  is  calculated  from  the  known 
values  of  the  received  sequence  R(z)  by  taking  its  inverse  transform, 

n-1 

r.  =  £  R , b”1^  i=0,  1,  ....  n-1  (2-14) 

1  j=0  J 

which  is  equivalent  to  evaluating  the  received  polynomial  R(z)  at 

.  ,  k°  k"1  ,-(n-l) 

the  n  values,  b  ,  b  ,  ...,  b 

The  symbols  a^,  i  >  k-1,  are  equal  to  zero  by  definition.  A 
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sequence  can  be  separated  from  equation  (2-14),  valid  for 
i=k,  k+1,  .  . . ,  n-1: 


s , 
1 


®  r . 
1 


n-1 

Z  v 

j=0  J 


-ij 


i=k,  k+1,  n-1  (2-15) 


This  sequence,  {s^},  is  the  error  syndrome  associated  with  the 
channel  error  pattern,  E(z) . 

The  error  syndrome  can  be  used  to  determine  the  locations  of 
the  errors  in  the  channel  error  pattern  E(z),  using  the  iterative 
algorithm  developed  by  Berlekamp  and  Massey  [6,7].  This  algorithm 
calculates  the  coefficients  of  the  error-locator  polynomial, 

t 

o(z)  =  (z  -  Xi)  =  ot  +  at_1z  +  .  .  .  +  zt  (2-16) 

i=l 

whose  distinct  roots  are  the  error  locations.  In  equation  (2-16)  t 
is  the  number  of  non-zero  coefficients  of  E(z),  or  equivalently  the 
number  of  errors  that  occur’ id.  We  assume  t  _<  (n-k)/2  so  that 
the  error  bound  of  the  code  is  not  exceeded.  The  error-locator 
polynomial  is  the  characteristic  polynomial  of  the  shortest  linear 
feedback  shift  register  (LFSR)  that  satisfies  uniquely  a  linear 
recursion  relationship  between  the  n-k  syndrome  values  and  the  co¬ 
efficients  of  the  error-locator  polynomial. 


s . 
1 


°t  + 


Sj+1  °t-l  + 


Sj+t-l  °1  +  Sj+t  =  ° 


(2-17) 


where  k  _<  j  _<  n-l-t. 
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The  Ber lekamp-Massey  algorithm  uses  as  its  inputs  the  error  syndrome 
values  and  provides  an  iterative  method  for  synthesizing  the  shortest 
LFSR  that  has  the  characteristic  polynomial  o( z)  .  Once  the  LFSR  has 
been  synthesized  by  this  algorithm,  it  is  necessary  only  to  continue 
its  operation,  with  zero  input,  for  an  additional  k  shifts  in  order 
to  extrapolate  the  k  unknown  values  of  the  error  transform  e(z). 

These  values  are  subtracted  from  the  corresponding  value  of  r(z)  in 
order  to  produce  the  corrected  message,  a(z) . 

2.3.1  Correction  of  Errors  and  Erasures 

The  previously  described  decoding  algorithm  has  been  concerned 
only  with  correcting  errors.  A  Reed-Solomon  code  can  correct  twice 
as  many  erasures  as  errors;  it  can  correct  any  pattern  of  t  errors 
and  s  erasures  provided  the  inequality  of  equation  (2-2)  is  satisfied. 

A  useful  Reed-Solomon  decoder  should  be  capable  of  correcting  both 
errors  and  erasures. 

A  method  of  correction  for  errors  and  erasures  is  to  initialize 
the  error-locator  algorithm  (Berlekamp-Massey)  with  the  connection 
polynomial  computed  from  the  known  erasure  locations,  and  then  continue 
the  algorithm  normally  to  synthesize  an  errata-locator  polynomial 
which  is  the  product  of  the  error-locator  polynomial  and  the  erasure- 
locator  polynomial  [A].  Once  the  errata-locator  polynomial  is  syn¬ 
thesized,  there  is  no  further  distinction  between  errors  and  erasures, 
and  the  inverse  transform  of  the  errata  pattern  may  be  extrapolated 
by  free-running  the  synthesized  LFSR  as  before.  These  values  are 
then  subtracted  from  the  corresponding  values  of  r(z)  in  order  to 
decode  the  correct  message. 

The  erasure-locator  polynomial,  X(z),  is  defined  as 

s 

X(z)  =  J  |  (z-xp  =  Xg  +  Xg_1z  +  ...  +  zS  (2-18) 

i=l 
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where  s  erasures  have  occurred,  not  exceeding  the  minimum-distance 

bound  of  equation  (2-2).  The  roots,  X  ,  designate  the  known  erasure 

locations  forming  a  set  that  is  disjoint  from  the  error  locations, 

X  .  The  modified  Berlekamp-Massey  algorithm  iteratively  calculates 
i 

the  errata-locator  polynomial,  o^z),  which  is  the  product  of  the 
error-locator  and  erasure-locator  polynomials: 


o'(z)  =  o( z)  X(z)  (2-19) 

The  errata-locator  polynomial  is  then  used  to  generate  the  transform 
of  the  channel  errata  pattern  which  is  subtracted  from  the  transform 
of  the  received  data  to  obtain  the  decoded  message. 

2.4  Transform  Encoding  and  Decoding:  Hardware  Structures 

The  preceding  view  concerning  the  transform  encoding  and  decoding 
of  Reed-Solomon  codes  was  meant  to  be  general  in  nature.  The  - equired 
computational  steps  and  procedures  do  not  imply  uniqueness  of  hardware 
implementation.  For  example,  transform  codeword  generation  requires 
that  n-k  consecutive  zeros  be  padded  to  the  original  k  information 
symbols  in  order  to  form  the  length-n  message  sequence.  In  section 
2.2,  the  zero-padding  was  defined  so  that  the  k  information  symbols, 
and  the  remaining  n-k  coefficients  were  zero.  This  zero-padding 
placement  is  not  unique;  the  cyclic  properties  of  the  code  result  in 
many  possible  zero-padding  placements.  Each  results  in  a  slightly 
different  design  and  physical  implementation  for  the  transform  encoder 
and  decoder,  without  modifying  the  general  algorithm. 

There  is  also  symmetry  associated  with  transform  encoding  and 
decoding.  A  forward  transform  may  be  defined  for  encoding;  an  inverse 
transform  would  then  be  required  for  decoding.  Alternately,  an 
inverse  transform  may  be  defined  for  encoding  and  a  forward  transform 
for  decoding.  Either  approach  is  correct:  their  hardware  implementa¬ 
tions  differ. 
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Regardless  of  the  particular  variations,  all  transl'orm-hased 
encoders  and  decoders  will  have  common  characteristics.  A  represent¬ 
ative  block  diagram  of  a  communication  system  that  uses  transform 
error-correction  encoding  and  decoding  techniques  is  shown  in  Figure  1. 
The  first  step  in  encoding  requires  that  the  k  information  svmbols 
be  padded  to  n  symbols  with  n-k  zeros.  The  second  step  in  encoding 
is  the  calculation  of  the  n-point  discrete  forward  (or  inverse)  linear 
transformat  ion.  These  n  symbols  are  then  transmitted  and  corrupted 
by  noise  in  the  channel.  the  receiver,  the  noisy  symbols  are 

observed  and  the  symbols  that  are  erasures  are  identified.  The 
received  symbols  and  the  locations  of  the  known  erasure0  are  sent  to 
the  decoder.  The  decoder  first  calculates  the  required  n-point  dis¬ 
crete  inverse  (or  forward)  linear  transformation.  The  known  erasure 
locations  are  used  to  initialize  the  errata-locat ion  section  with 
the  erasure-locator  polynomial.  The  n-k  syndrome  values  are  separa¬ 
ted  from  the  transform  of  the  received  symbols  and  are  used  as  inputs 
to  the  errata  locator.  This  section  calculates  the  errata-locator 
polynomial  as  in  equation  (2-19).  The  errata-location  section  then 
calculates  the  transform  of  the  errata  that  occur  during  transmission. 
This  data  is  subtracted  from  the  transform  of  the  received  data,  re¬ 
covering  the  k  original  information  symbols.  The  total  number  of 
errors  and  erasures  is  assumed  to  be  within  the  bound  of  the  code 
given  in  equation  (2-2). 


SECTION  III 


A  (255, k)  REED-SOLOMON  TRANSFORM  ENCODER  AND  DECODER 

A  design  at  the  detailed  logic  level  of  a  transform  encoder  and 
decoder  for  use  with  the  Reed-Solomon  class  of  symbol  error-correcting, 
codes  is  described  in  this  section.  it  is  a  computationally  efiieient 
inn  1  omen tat  ion  of  the  transform  decoding  algorithm  described  in 
Volume  I  of  this  report  (and  summarized  in  section  II  of  this  volume). 
The  encoder  and  decoder  can  implement  a  255-symbol  block-length  code, 
as  well  as  many  shorter  codes.  It  is  designated  as  the  (255, k)  en¬ 
coder  and  decoder. 

3 . 1  General  Description 

A  use!  cl  error-correcting  encoder  and  decoder  should  operate  with  a 
number  of  different  code  parameters  in  order  to  be  applicable  to  various 
channel  characteristics  and  system  designs.  The  error  controller's 
hardware  implementation  must  be  capable  of  implementing  different 
block  lengths  and  different  symbol  alphabets.  To  encode  and  decode 
an  (n,k)  code  constructed  over  GF(2m),  the  hardware  must  implement 
an  n-point  finite-field  transform  where  each  symbol  in  the  transform 
is  represented  by  m  bits.  The  ability  to  calculate  transforms  of 
different  lengths  over  different  finite  fields  requires  that  the  hard¬ 
ware  be  able  to  implement  algebraic  operations  that  are  defined  in 
the  different  fields.  The  essential  algebraic  operations  that  must 
be  implemented  are  field-element  addition,  multiplication,  and  inver¬ 
sion.  Field-element  multiplication  is  defined  uniquely  for  each 
binary-extension  field,  and  the  hardware  that  implements  multiplica¬ 
tion  in  one  field  must  be  reconfigured  to  multiply  correctly  in 
another.  (See  appendix  A  of  this  report  for  a  more  detailed  descrip¬ 
tion  of  GF(2m)  multiplier  structures.)  In  general  the  implementation 
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of  a  versatile  encoder  and  decoder  requires  hardware  reconfigurability 
to  operate  successfully  in  different  binary-extension  fields. 


Our  implementation  of  the  transform  decoding  algorithm  was  de¬ 
signed  to  minimize  the  total  number  of  binary-extension-field  multi¬ 
plications  required  for  both  encoding  and  decoding  [4].  The  (255, k) 
encoder  and  decoder  was  designed  to  operate  with  serial  input  code  sym¬ 
bols  so  that  many  of  the  required  finite-field  multiplications  can  be 
calculated  sequentially  in  time  using  the  same  hardware.  The  resulting 
architecture  tends  to  minimize  the  total  number  of  GF(2m)  multipliers 
that  have  to  be  implemented,  minimizing  the  amount  of  hardware  recon¬ 
figurability  and  the  resulting  hardware  complexity  required  to  accom¬ 
modate  the  codes  from  the  different  binary-extension  fields. 

The  natural  partitioning  of  the  transform  decoding  algorithm  sep¬ 
arates  the  decoder's  structure  into  a  transform  section,  an  errata- 
location  section,  and  a  control  section.  The  encoding  algorithm  par¬ 
titions  the  encoder  into  a  transform  section  and  a  control  section. 

We  developed  the  logical  design  of  a  general  transformer  that  imple¬ 
ments  a  computationally  efficient  number-theoretic  transform  algorithm 
[10].  The  transform  section  was  designed  to  calculate  both  a  forward 
and  an  inverse  discrete  transform  over  the  fields  of  interest.  The 
same  structure  can  be  used  for  both  encoding  and  decoding,  resulting 
in  a  considerable  saving  in  hardware  design  and  fabrication,  thus 
rendering  it  suitable  for  a  VLSI  chip-set  implementation. 

The  control  section  provides  data  management  to  the  transform 
and  errata-location  sections.  This  control  can  be  implemented  using 
standard  TTL  logic  or  dedicated  LSI  or  VLSI  circuitry.  Control  also 
could  be  provided  by  use  of  a  software-programmable  microprocessor. 

This  report  is  not  concerned  further  with  the  detailed  design  of  the 
control  section.  The  architectural  design  of  the  transform  and 
errata-location  sections,  which  carry  out  the  major  computational  steps 
in  the  encoding  and  decoding  algorithms,  are  emphasized  in  this  section. 
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The  hardware  complexity  associated  with  both  the  transform 
section  and  the  errata-locat ion  section  is  such  that  each  section 
could  be  Implemented  using  a  single  VLSI  monolithic  device  [11 j. 

This  level  of  complexity  is  fundamental  to  the  concept  of  a  versatile 
encoder  and  decoder.  Since  the  same  transformer  can  be  used  for  com¬ 
puting  either  a  forward  or  an  inverse  discrete  transform,  a  complete 
encoder  and  decoder  can  be  implemented  using  only  two  devices.  A 
transform  "chip"  and  an  errata-location  "chip"  would  be  required  for 
decoding,  while  only  the  transform  "chip"  would  be  required  for 
encoding.  The  hardware  necessary  to  perform  encoding  is  inherently 
contained  within  the  hardware  required  for  decoding. 

3.1.1  Coding  Capabi 1  it ies 

The  (255, k)  Reed-Solomon  encoder  and  decoder  was  designed  to 
provide  a  selection  of  useful  codes  while  containing  the  complexity 
of  the  projected  hardware.  The  range  of  Reed-Solomon  codes  that 
can  be  processed  by  the  (255, k)  encoder  and  decoder  design  is  shown 
in  Table  I.  These  codes  represent  a  large  number  of  both  maximum 
and  submaximum  length  codes  over  GF(2m)  where  the  symbol  represen¬ 
tation,  in,  ranges  from  four  to  eight  bits. 

The  errata-location  sect  ion's  architecture  is  bit-slice  and 
expandable  to  accommodate  any  code  rate;  each  symbol  used  for 
redundancy  requires  a  corresponding  hardware  slice  within  the  de¬ 
coder.  However,  it  is  desirable  that  the  err ata-locator  be  imple- 
mentabie  as  a  single  integrated  circuit,  and  this  requirement  re¬ 
stricts  the  er rata-locator's  implementation  to  a  size  (total  number 
of  transistors)  that  can  process  a  maximum  of  128  symbols  used  for 
redundancy.  Since  Reed-Solomon  codes  are  maximum  distance  separable 

codes,  this  restricts  the  largest  value  of  d  .  to  129  and  equiva- 

min 

lently  restricts  the  largest  number  of  syndrome  symbols  to  128.  The 
(255, k)  decoder  Is  consequently  designed  to  operate  with  a  maximum  of 
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Table 


I 


Total 


of  128  syndrome  symbols,  regardless  of  code  block  size.  For  a  Reed- 
Sol  omon  (n,k)  code,  the  number  of  syndrome  symbols  is  (n-k).  Accordingly, 
the  (255, k)  decoder  can  correct  all  combinations  of  t  errors,  and  s 
erasures,  provided  the  ineqality 


2t  +  s  v  n-k  <  128 


is  satisfied. 

The  Reed-Solomon  (255, k)  encoder  and  decoder  design  can  accom¬ 
modate  588  distinct  codes  defined  by  different  allowed  choices  of  the 
parameters  n  and  k.  This  number  is  derived  from  the  maximum  number 
of  syndrome  symbols  and  the  variety  of  code  classes  that  can  be  pro¬ 
cessed.  For  example,  the  (255, k)  class  of  codes,  constructed  over 

g 

GF(2  ),  represents  a  family  of  codes  whose  block  length  is  fixed  at 
255  symbols  but  whose  number  of  information  symbols,  k,  is  a  variable. 
The  design  trade-offs  which  restrict  the  maximum  number  of  allowable 
syndrome  symbols  to  128  define  a  total  of  128  distinct  codes  in  this 
class  (i.e.,  k  can  range  from  127  to  255).  For  the  other  families  of 
(n,k)  codes  shown  in  Table  I,  k  can  range  from  1  to  n.  Some  of  these 
codes  are  trivial  but  most  are  not.  Table  II  indicates  the  seventeen 
approximately  half-rate  codes  that  can  be  accommodated  by  the  (255, k) 
encoder  and  decoder's  design. 

3 . 2  (255, k)  Transform  Encoder  and  Decoder  Architecture 

The  (255, k)  encoder  and  decoder  is  partitioned  into  a  transform 
section  and  an  errata-locat ion  section.  The  transform  section  imple¬ 
ments  either  a  forward  or  an  inverse  n-point  discrete  transform  and  it 
is  used  for  either  encoding  or  decoding.  The  errata-location  section 
implements  a  modified  version  of  the  Berlekamp-Massey  minimal-length 
f.FSR  synthesis  algorithm.  This  algorithm,  used  for  decoding,  corrects 
erasures  as  well  as  errors.  Both  the  transform  section  and  the  errata- 


Table  II 


Half-Rate  Codes  Accommodated  by  the  (255, k) 
Reed-Solomon  Transform  Decoder 


CODE 

(n,k) 

BITS  PER  SYMBOL 

m 

(255,  127) 

8 

(127,  63) 

7 

(85,  42) 

8 

(63,  31) 

6 

(51,  25) 

8 

(31,  15) 

5 

(21,  10) 

6 

(17,  8) 

8 

(15,  7) 

8,  4 

(9,  4) 

6 

(7,  3) 

6 

(5,  2) 

8,  4 

(3,  1) 

8,  4,  6 
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location  section  are  r econf igurable  to  operate  over  the  binary-exten¬ 
sion  fields,  GF(2m),  with  m  ranging  from  four  to  eight  bits. 

3.2.1  Transform  Section 

The  range  of  transforms  that  can  be  calculated  by  the  transform 
section  of  the  (255, k)  encoder  and  decoder  is  shown  in  the  Table  III. 

This  table  indicates  the  number  of  symbols  in  the  transform,  n,  the 
number  of  bits  per  symbol,  m,  (specifying  the  field  of  operation  GF(2m)), 
and  the  kernel  of  the  transform,  a  .  For  an  n-point  transform  over 
GF(2m),  the  kernel  is  an  n1"^  root  of  unity,  that  is,  c/'  is  an  element 

of  GF(2m)  of  multiplicative  order  n,  so  that  n  is  the  least  integer  for 

...  Kn 
which  a  =1. 

To  calculate  an  n-point  forward  transform  over  GF(2m)  the  trans¬ 
form  section  must  implement 


A. 

J 


i=0 


V 


Kij 


j  —  0,  1,  ..*,  n  1 


(3-2) 


where. 


and 


Aj  ,  ajL  e  GF(2m)  ;  0  <  i,  j  f  n-1 

aK  e  GF(2m),  with  multiplicative  order  n. 
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Table  III 


Transform  Capabilities  of  (255, k)  Decoder's  Transform  Section 


Transform  Size 

Bits  Per  Symbol 

Kernel  of  Transform 

n 

m 

-   . 

255 

8 

1 

a 

127 

7 

i 

a 

85 

8 

3 

a 

63 

6 

1 

a 

51 

8 

5 

a 

31 

5 

1 

a 

21 

6 

3 

a 

17 

8 

15 

a 

15 

8 

17 

a 

15 

4 

1 

a 

9 

6 

7 

a 

7 

6 

9 

a 

5 

8 

51 

a 

5 

4 

3 

a 

3 

8 

85 

a 

3 

6 

21 

a 

3 

4 

5 

a 
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As  mentioned  in  section  II,  a  forward  transform  can  be  interpreted 
as  polynomial  evaluation,  which  can  be  represented  as 


A.  =  a(/j) 


j=0,  1,  ....  n-1 


0-3) 


where  a(x)  is  an  (n-l)th  degree  polynomial  over  GF(2m). 

To  calculate  an  inverse  n-point  finite-field  transform,  over 
GF(2m),  the  transform  section  must  implement: 


n-1 

=  £ 

j=0 


-Kij 


i=0,  1, 


(3-4) 


or  equivalently, 


=  A(a  Kl) 


i=0,  1 . n-1  (3-5) 


where  A(z)  is  an  (n-l)th  degree  polynomial  defined  over  GF(2m). 

To  calculate  either  a  forward  or  an  inverse  finite-field  trans¬ 
form  the  transform  section  implements  a  polynomial  evaluation  algor¬ 
ithm.  To  calculate  a  forward  n-point  transform,  over  GF(2m),  the 

transform  section  evaluates  an  (n-l)th  degree  polynomial  at  the  n 

£ 

distinct  powers  of  the  elanent  a  .  To  calculate  an  n-point  inverse 
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finite-field  transform  over  the  same  field,  the  transform  section 

still  evaluates  an  (n-l)th  degree  polynomial  at  the  n  distinct  powers 
£ 

of  the  element  a  ,  but  the  order  of  evaluation  is  reversed  since 

-K&  K(n-H) 
a  =  a 

The  transform  section  implements  a  computationally  efficient 
algorithm  for  polynomial  evaluation.  For  an  n-point  transform,  the 
n  points  to  be  transformed  are  defined  as  the  coefficients  of  an 
(n-l)th  degree  polynomial  over  GF(2m).  This  data  polynomial  is  divided 
by  a  set  of  polynomials  of  smaller  degree  whose  roots  are  conjugate 
sets  of  the  n  distinct  powers  of  a  .  Each  remainder  polynomial,  or 
residue  polynomial,  is  then  evaluated  at  each  of  the  conjugate  roots 
of  its  corresponding  divisor  polynomial  in  order  to  obtain  the  trans¬ 
formed  points.  The  set  of  divisor  polynomials  is  the  same  for  either  a 
forward  or  an  inverse  transform;  the  order  of  evaluation  determines 
which  transform  is  being  calculated. 

An  n-point  transform  pair  is  defined  on  GF(2m)  if  n  divides 
2m-l.  If  n  equals  2m-l,  then  the  transform  is  maximum -length  ,  the 
set  of  divisor  polynomials  is  the  set  of  all  minimal  polynomials 
associated  with  GF(2m),  and  the  n  points  of  evaluation  are  the  2m-l 
non-zero  field  elements.  If  Y  is  greater  than  one,  where  Kn  =  2m-l, 

then  the  transform  is  submaximum  ■'■length  and  the  set  of  divisor  poly¬ 
nomials  is  defined  as  the  set  of  minimal  polynomials  that  have  the  n 
distinct  powers  of  as  roots.  The  points  at  which  the  residue  poly¬ 
nomials  are  evaluated  are  the  n  powers  of  . 
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tn  order  to  evaluate  an  t-th  degree  remainder  polynomial  at  the 
point  ,  the  following  equation  is  implemented. 


r(c^)  =  rQ  +  r^oi^)  +  r2(c^)  +  •••  +  r^cr1) 


(3-6) 


This  is  equivalent  to  a  continued  product  expansion 


r(cr  ) 


(r^a 


+  Vl^  + 


■  •  + 


r, )a  +  r 


(3-7) 


This  expansion  can  be  effectively  implemented  using  an  extension 
field  multiplier  and  accumulator  as  a  polynomial  evaluator.  The 
symbol  errata-locator  requires  one  transformed  symbol  at  a  time,  and 
the  transform  section  is  required  to  supply  sequentially-calculated 
transform  points.  A  single  polynomial-evaluator  circuit  may  be 
multiplexed  to  calculate  the  desired  n  transform  symbols. 

The  operation  of  the  transform  section  can  be  partitioned  into 
two  functions.  The  transformer  first  divides  the  (n-l)th  degree 
data  polynomial  simultaneously  by  all  minimal  polynomials  of  the 
elements  of  CF(2m).  Then,  each  point  in  the  transform  is  sequentially 
calculated  by  evaluating  the  appropriate  residue  polynomial  at  the 
corresponding  element  in  the  field.  The  order  of  evaluation  determines 
whether  the  transform  is  forward  or  inverse.  A  block  diagram  of  the 
transform  section  is  shown  in  Figure  2.  In  this  figure,  the  transform 
section  is  partitioned  into  a  polynomial  residue  calculator,  a 
multiplexer,  a  polynomial  residue  evaluator,  and  an  arithmetic 
controller . 
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3.2.  1.1  Polynomial  Residue  Calculator.  As  a  first  step  in  calculating 
an  n-point  transform  over  (1F(2  )  the  polynomial  residue  calculator 
simultaneously  divides  the  polynomial  representing  the  data  to  be 
transformed  by  all  minimal  polynomials  of  the  elements  of  GF(2m). 
Polynomial  division  is  implemented  with  LFSRs  whose  feedback-connection 
polynomials  are  defined  to  be  the  divisor  polynomials.  The  fast 
polynomial  evaluation  algorithm  defines  the  divisor  polvnomials  to  be 
the  minimal  irreducible  polynomials  from  the  finite  field  of  operation, 
Ihis  means  that  the  divisor  polynomials  for  the  residue  calculator.-;  are 
irreducible  over  GF(2)  and  the  coefficients  of  the  corresponding  l.FSR's 
feedback  connection  polynomials  are  restricted  to  either  one  or  zero. 
Therefore,  there  are  no  extension-field  multiplications  lequired  to 
implement  the  division  portion  of  the  fast  polynomial  evaluation  algorith 
Division  can  be  implemented  using  only  scalar  multiplication  (bv  either 
zero  or  one)  and  GF(2)  (modulo-two)  addition. 

The  polynomial  residue  calculator  is  capable  of  dividing  by  all 
the  minimal  polynomials  in  GF(2m) ,  where  m=4,5,6,7,  and  8.  A  complete 
list  of  these  polynomials  is  presented  in  Tables  IV-1  through 
IV-3.  There  are  a  total  of  66  polynomials  for  the  five  different 
binary-extension  fields.  In  order  to  provide  all  of  the  transform 
capabilities  indicated  in  Table  III,  the  polynomial  residue  calculator 
must  be  capable  of  dividing  by  all  66  minimal  polynomials;  however, 
only  simultaneous  division  by  the  polynomials  from  the  field  of 
operation  is  required  for  calculating  a  particular  transform.  The 

g 

binary-extension  field  GF(2  )  has  the  largest  number  of  minimal  poly¬ 
nomials;  there  are  35  divider  circuits  to  be  implemented  for  transfor¬ 
mation  in  this  field.  The  key  to  minimizing  the  residue  calculator's 
hardware  is  to  design  these  35  circuits  to  be  reconf igurable  in  order 
to  provide  for  division  by  the  remaining  31  minimal  polynomials  needed 
for  transformation  in  the  four  other  finite  fields. 

To  facilitate  the  description  of  the  residue  calculator's 
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Table  IV  -  1 


g 

Minimal  Irreducible  Polynomials  over  GF(2  ) 


/x  ^  1  ,  2  3  4  5  ,  6  ,  7 

m.(z)  =  mA  +  m.z  +  m„z  +  m  z  +  m.z  +  m.z  +  m.z  +  m_z 


T  '  '  0 

Polvnomials 


1 

m 


“27 


(z) 
(z) 
m3  (z) 

»r  (Z) 

m7  (z) 
m9  (z) 

ml  1 

^  i  o  (  2  ) 

(z) 

m  7  (z) 

mi9  ;z! 
mpi  (z) 
m  (z) 

™25  (z) 

m2  y  (  Z  ) 
m2Q  (z) 

(z) 


1 

0 

L 

1 

0 

0 

1 

1 

I 

1 

0 

1 

1 

1 

1 

0 

0 

1 

0 

1 

0 

0 

I 

1 

0 

0 

1 

0 

1 

1 

0 

1 

1 

0 

0 


0 

1 

I 

0 

0 

1 

1 

0 

1 

0 

1 

0 

0 

0 

1 

I 

1 

1 

0 

0 

0 

0 

1 

1 

0 

I 

1 

1 

1 

0 

1 

1 

0 

0 

0 


0 

1 

0 

0 

1 

1 

0 

I 

0 

0 

0 

1 

0 

1 

I 

1 

1 

1 

I 

0 

I 

1 

I 

0 

0 

1 

1 

1 

0 

0 

0 

1 

1 

1 

0 


0 

1 

1 

1 

0 

1 

0 

0 

1 

1 

0 

0 

0 

1 

1 

0 

0 

1 

1 

0 

1 

0 

1 

0 

1 

c 

0 

1 

0 

0 

1 

1 

1 

1 

1 


0 

0 

1 

1 

1 

1 

1 

1 

0 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

1 

0 

0 

1 

0 

0 

0 

0 

1 

1 

0 

1 

0 

1 


0 

0 

1 

1 

1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

0 

0 

0 

I 

1 

I 

0 

0 

0 

0 

0 

1 

1 

1 

0 

0 

1 

0 

1 

0 

1 


0 

0 

0 

1 

0 

1 

1 

0 

1 

0 

0 

1 

0 

0 

0 

1 

0 

0 

1 

1 

0 

1 

0 

1 

1 

0 

1 

1 

0 

1 

1 

1 

0 

0 

0 


m8z 


0 

1 

1 

1 

1 

1 

1 

1 

1 

0 

1 

1 

1 

I 

1 

1 

1 

I 

1 

1 

1 

1 

0 

1 

1 

1 

I 

1 

0 

1 

1 

1 

1 

0 

1 
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Table  IV  -  2 


Minimal  Irreducible  Polynomials  over  GF(2  ) 


/  \  .  1  2  3  4  ^  5  6  7 

m.(,z)  =  mn  +  m.z  +  ra-z  +  m.z  +  m,  z  +  mrz  +  m.z  +  m.,z 
r  U1ZJ4  56  7 


Polynomial 


13  )  ' 

(z) 


Minimal  Irreducible  Polynomials  over  GF(2  ) 

12  3  4  5  6 

m.(z)  =  m„  +  m.z  +  m„z  +  m.z  +  m.z  +  m.z  +  m,z 

i  0  1  2  3  4  5  6 


Polynomial  m^  m^  m^  m,. 
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Table  IV  -  3 


itkCz)  =  mQ 


Polynomial 


m 


11 

J15 


Minimal  Irreducible  Polynomials  over  GF(2  ) 


,  1  .  2  3  4  5 

+  m,z  +  m.z  +  m.z  +  m.z  +  mrz 
12  3  4b 


mQ  m1  m2 


m3  m4  m5 


1  1  0 
1  0  1 
1  0  1 
1  1  1 
1  1  1 
1  1  0 
1  0  0 


0  0  0 
0  0  1 
1  1  1 
0  1  1 
1  0  1 
1  1  1 
1  0  1 


Minimal  Irreducible  Polynomials  over  GF(2  ) 


12  3  4 

m.(z)  =  m0  +  m.z  +  m„z  +  m„z  +  m.z 
l  0  12  3  4 


Polynomial  mn  m  m  m  m. 

0  12  3  h 


m 

m 

m 

m 

m 


0 

1 

3 

5 

7 


110  0 
110  0 
1111 
1110 
10  0  1 


0 

1 

1 

0 

1 
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architecture,  it  is  advantageous  to  examine  the  structure  used  to 

implement  polynomial  division.  Figure  3  shows  the  structure  of  an 

LFSR  that  is  used  to  perform  polynomial  division.  (A  detailed 

description  of  the  operation  of  this  circuit  can  be  found  in  Chapter 

7  of  reference  [l]  .)  The  circuit  shown  is  designed  to  divide  by 

8  7  4  3  2 

the  polynomial  Mg^  =  x  +  x  +x  +  x  +x  +x+l,  which  is  a  mini¬ 
mal  polynomial  from  GF(2^).  The  positions  of  the  feedback  taps  are. 
determined  by  the  coefficients  of  the  divisor  polynomial.  In  order 
to  perform  division,  the  registers  are  all  cleared  to  zero,  and  the 
data  representing  the  polynomial  to  be  divided  is  fed  sequentially 
into  the  shift  register.  After  the  last  symbol  is  entered,  the 
remainder  polynomial,  or  residue  polynomial,  is  stored  in  the  registers 
of  the  divider  circuit.  This  residue,  R(x)  =  R^  +  R^x  +  ...  +  R^x7, 
is  required  for  completion  of  the  fast  polynomial  evaluation  algorithm. 

The  structure  shown  in  Figure  3  is  designed  to  operate  with 

g 

symbols  from  GF(2  ):  the  data  lines  are  eight  wide,  the  delay  stages 
are  eight  registers  deep,  and  the  Exclusive-OR  circuits  operate  with 
eight-bit  words.  Since  the  divisor  polynomial  contains  only  either 
zero  or  one  as  coefficients,  the  circuit  shown  in  Figure  3  can  be 
interpreted  as  eight  identical  binary  feedback  shift  registers  (BFSRs) , 
each  circuit. capable  of  accommodating  a  single  bit  of  each  eight-bit 
input  symbol.  Each  of  the  eight  BFSRs  contains  delay  stages  that 
are  only  one  bit  deep,  and  the  modulo-two  adders  are  two-input  binary 
Exclusive-OR  gates,  forming  eight  identical  "slices",  each  physically 
separate  from  its  seven  companions. 

g 

All  35  minimal  polynomials  from  GF(2  )  can  be  implemented  using 
circuits  that  are  similar  to  the  structure  shown  in  Figure  3.  In 

Q 

order  to  implement  the  polynomial  division  in  GF(2  ),  a  total  of  eight 
identical  slices  of  hardware  is  required  for  each  polynomial.  Each 
slice  contains  35  different  BFSRs  with  each  shift  register  having  a 
maximum  length  of  eight  stages. 
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There  are  two  fundamental  problems  associated  with  designing  the 

g 

35  divider  circuits  required  for  operation  in  GF(2  )  to  be  reconfig- 
urable  to  operate  in  the  other  finite  fields.  First  of  all,  the 
different  fields  of  operation  have  symbol  sizes  that  range  from  four  to 
eight  bits,  and  the  divider  circuits  must  be  capable  of  operating  with 
these  different  symbol  sizes.  Secondly,  the  circuits  must  be  recon- 
figurable  to  provide  for  division  by  different  divisor  polynomials. 

The  positions  of  the  feedback  taps  as  well  as  the  lengtlis  of  the  reg¬ 
isters  are  subject  to  reconfigurability.  Both  problems  are  made  more 
difficult  because  of  the  desire  to  design  the  divider  circuits  to  be 
as  versatile  as  possible,  but  we  would  also  like  to  keep  the  total 
amount  of  hardware  at  a  reasonable  level  without  incurring  a  large 
overhead  for  reconfigurability. 

The  necessity  to  operate  with  different  symbol  sizes  is  a  con¬ 
sideration  that  recurs  throughout  the  design  of  both  the  transform 
section  and  errata-location  section.  Our  approach  is  to  define  a 
standard  symbol  size  of  eight  bits  and  design  all  hardware  to  accom¬ 
modate  this  symbol  size  and  to  be  programmable  for  smaller  fields. 

g 

Since  the  hardware  must  accommodate  symbols  from  CF(2  ),  no  addition¬ 
al  hardware  is  required  when  defining  an  eight-bit  standard  symbol, 
but  some  overhead  is  incurred  for  reconfigurability. 

Any  symbol  from  the  field  GF(2m)  can  be  represented  as  an  m  bit 
sequence 


where 


k 

a 


GF(2m) 


,  k  k  k 

{‘0>  '*1*  *"•  ”  m-1 


L  GF (2) 


i=0 .  1 ,  ....  m-1 . 


(3-8) 
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Any  symbol  from  GF(2m),  where  m  8,  can  be  represented  as  a  binary 
eight-tuple  where  some  of  the  bits  are  set  to  zero.  We  have  defined 
our  standard  symbol  as  an  eight-bit  word  such  that  a  is  represented 
as 


ck  e  GF(2m) 


m  8-m 


(3-9) 


The  notation  of  equation  (3-9)  will  be  frequently  referred  to  as  our 
"standard"  symbol  in  this  report. 

When  the  residue  calculator  is  operating  with  symbols  from  GF(2m), 
where  m<8,  our  definition  of  standard  symbol  size  results  in  zeros 
being  fed  into  the  8-m  slices  corresponding  to  the  bit-positions 
greater  than  m-1.  The  operation  of  the  BFSRs  associated  with  these 
zeros  has  no  effect  on  the  m  slices  required  for  the  desired  division. 

The  problem  of  designing  the  35  divider  circuits  to  be  reconfig- 
urable  to  provide  division  by  all  necessary  minimal  polynomials  reduces 
to  the  problem  of  designing  35  BFSRs  to  be  reconf igurable  for  the 
required  division.  The  design  can  then  be  repeated  eight  times  to 
obtain  the  parallel  structure  for  the  eight-bit  polynomial  residue 
calculator . 

A  goal  associated  with  the  design  of  a  reconf igurable-divider 
circuit  is  to  minimize  the  amount  of  hardware  required  for  program¬ 
mability.  The  design  must  be  reconf igurable  to  accommodate  different 
divisor  polynomials  of  varying  length.  Minimizing  the  amount  of  hard¬ 
ware  required  for  reconf igurability  is  roughly  equivalent  to  minimizing 
the  number  of  programmable  feedback  taps.  Each  programmable  feed- 
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back  tap  allows  the  connection  of  a  shift  register's  output  to  the 
particular  stage  where  the  tap  is  located.  The  hardware  associated 
with  a  programmable  tap  must  be  repeated  on  all  eight  slices,  and 
accompanying  discrete  logic  or  memory  must  be  dedicated  to  control 
the  operation  of  each  programmable  tap. 

Two  techniques  can  be  combined  to  minimize  the  amount  of  hardware 
required  for  programming  the  BKSRs.  Both  techniques  exploit  the  fact 
that  the  maximum  degree  of  a  minimal  polynomial  from  GF(2m)  is  m. 

The  divider  circuits  are  designed  originally  to  implement  simultaneous 

g 

division  bv  the  35  minimal  polynomials  in  GF(2  ).  A  subset  of  these 
circuits  is  required  to  be  reconfigurable  in  order  to  implement  division 
by  the  19  minimal  polynomials  in  GF(2'7).  A  second  subset  of  the 

original  circuit  is  required  to  be  reconfigurable  for  operation  in 

6  5 

GF(2  ).  A  third  and  fourth  subset  are  required  for  operation  in  GF(2  ) 

and  GF(2'+).  Each  of  the  four  subsets  requires  BFSRs  that  are  shorter 

g 

in  length  then  the  eight  stages  required  for  operation  in  GF(2  ). 

The  first  minimization  design  technique  is  to  group  minimal 
polynomials  from  different  fields  that  have  identical  or  similar 
coefficients.  Programmable  taps  are  only  required  where  discrepancies 
between  tap  weights  occur.  The  second  minimization  technique  is  to 
design  the  output  tap  of  each  shift  register  to  be  programmable  so 
that  division  by  polynomials  of  different  degrees  can  be  implemented 
in  the  same  circuit. 

An  illustrative  example  helps  to  clarify  these  concepts.  Figure 

4-a  shows  the  logic  level  design  of  a  BFSR  that  represents  a  single 
slice  of  a  divider  circuit  suitable  for  use  in  the  polynomial  residue 
calculator.  The  circuit  consists  of  an  eight-stage  feedback  shift 
register  whose  output  tap  can  be  selected  from  one  of  five  locations. 

The  output  multiplexer  selects  the  position  of  the  output  tap  that 
defines  the  feedback  connection  polynomial  (divisor  polynomial).  The 
reconfigurability  of  this  circuit  for  division  in  the  different  fields 
is  shown  in  Table  V. 
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Figure  4.  Programmable  Binary  Feedback  Shift  Register 


TABLE  V 


Programmability  of  Binary  Feedback  Shift  Register:  Figure  4 


Each  of  the  original  35  divider  circuits  can  be  designed  in  a 
manner  similar  to  that  shown  in  Figure  4.  Unfortunately,  there  is 
no  readily  apparent  systematic  method  for  assigning  subsets  and  feed¬ 
back  taps.  However,  the  design  methodology  results  in  hardware- 
efficient  structures.  The  (51, k)  breadboard  to  be  described  in  section 
IV  was  designed  to  accommodate  a  large  subset  of  the  codes  that  can 
be  processed  by  the  (255  , k)  encoder  and  decoder.  The  breadboard's 
polynomial  residue  calculator  was  designed  using  the  minimization 
techniques  described  in  tills  section,  and  only  three  programmable 
taps  were  required,  one  on  each  of  three  separate  divider  circuits. 

The  fundamental  structure  of  the  polynomial  residue  calculator 
consists  of  eight  identical  slices  of  hardware.  Each  slice  consists 
of  35  BFSRs,  each  being  reconf igurable  to  provide  for  division  by  a 
set  of  different-length  divisor  polynomials.  The  maximum-length 
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divisor  polynomial  that  can  be  implemented  has  degree  eight;  conse¬ 
quently,  the  maximum  length  of  any  feedback  shift  register  is  eight. 

The  polynomial  to  be  divided  is  fed  sequentially  into  all  35  divider 
circuits.  Division  is  completed  after  the  last  coefficient  of  the  data 
polynomial  has  been  entered.  At  this  time,  the  calculated  residue 
polynomials  are  stored  within  the  delay  stages  of  the  divider  circuits. 
These  residue  polynomials  are  transferred  into  a  temporary  holding 
memory,  and  the  divider  circuits  are  available  for  processing  the 
next  block  of  data.  The  residue  polynomials  are  then  available  in 
temporary  memory  for  further  processing  for  completion  of  the  fast 
polynomial  evaluation  algorithm.  In  this  manner,  the  polynomial  residue 
calculator  can  be  thought  of  as  pipelined,  capable  of  simultaneously 
operating  on  two  contiguous  blocks  of  n  symbols,  thus  accepting  a  con¬ 
tinuous  input  stream. 

3. 2. 1.2  Polynomial  Residue  Evaluator.  The  polynomial  residue  evaluator 
implements  the  second  portion  of  the  polynomial-evaluation  algorithm. 

For  operation  in  GF(2  )  (regardless  of  code  block  size),  the  polynomial 
residue  calculator  provides  the  residue  evaluator  with  the  remainder 
polynomials  that  result  from  the  division  of  the  input  data  sequence 
by  all  the  minimal  polynomials  in  GF(2  ) ,  The  residue  evaluator 
sequentially  calculates  each  point  in  the  n-point  transformation  by 
selecting  a  predetermined  residue  polynomial  and  evaluating  that 
residue  at  a  root  of  its  corresponding  divisor  polynomial.  By  defini¬ 
tion,  the  point  of  evaluation  is  a  power  of  aK.  If  the  transform  is 
of  maximum  length,  n=2  -1,  then  the  residue  associated  with  each 
minimal  polynomial  in  the  field  will  be  used  at  least  once.  If  the 
transform  is  submaximum,  n|2  -1,  then  the  residues  associated  with 
a  subset  of  the  minimal  polynomials  from  GF(2m)  will  be  used.  Table 
VI  indicates  the  subsets  of  minimal  polynomials  associated  with  each 
of  the  transforms  listed  in  Table  III. 

The  only  difference  between  the  computation  of  a  forward  and  an 
inverse  n-point  transform  over  GF(2m)  is  the  order  in  which  the  trans- 


Table  VI  (Concluded) 


Transforms  over  GF(2  ) 


Field  of 
Calculation 

Transform  Length 

N 

Required  Minimal  Polynomial 
Divisors 

GF(26) 

— 

63 

mQ  (z),  (z),  m3  (z), 

m5  (z),  m?  (z),  mg  (z), 

m31  (z),  m13  (z),  m15  (z), 

m21  (z),  m23  (z),  m27  (z), 

m31  (z) 

21 

m0  (z),  m3  (z),  mg  (z)  , 
m15  (z),  m21  (z),  m27  (z) 

9 

mQ  (z),  m7  (z),  m21  (z) 

7 

mQ  (z),  mg  (z),  m27  (z) 

3 

mQ  (z),  m21  (z) 

CF(25) 

31 

mg  (z)  ,  m3  (z),  m3  (z)  , 

in,  (z),  m?  (z),  m  (z), 

m°l5  (z) 

GF(24) 

15 

m  (z),  m.  (z),  m..  (z), 

m^  (z),  m?  (z) 

5 

mg  (z),  m3  (z) 

3 

mQ  (z),  m5  (z) 

formec  .vmbols  are  calculated.  This  is  a  bookkeeping  matter  irrelevant 
to  the  architecture  and  operation  of  the  residue  calculator.  The 


order  of  evaluation  is  determined  by  the  arithmetic  controller.  For 
each  transform  point,  the  controller  (see  section  3.2.  1.3)  provides 
the  residue  evaluator  with  all  the  information  needed  to  implement 
the  polynomial  evaluation  algorithm. 

A  block  diagram  of  the  polynomial  residue  evaluator  is  shown  in 
Figure  5.  For  each  point  in  the  transform,  the  polynomial  residue 
evaluator  performs  two  major  operations.  First,  the  input  multiplexer 
selects  a  residue  polynomial  from  the  residue  calculator.  Secondly, 
the  residue  evaluator  calculates  each  point  in  the  transform  by 
evaluating  the  selected  residue  using  a  multiplier  and  accumulator 
defined  for  GF(2m) .  The  central  components  of  the  polynomial  residue 
evaluator  are  the  GF(2m)  multiplier  and  accumulator.  The  r  attainder 
of  this  section  will  concentrate  on  a  description  of  their  design  and 
operat ion. 

The  residue  evaluator  implements  polynomial  evaluation  using  the 
continued  product  expansion  of  equation  (3-7).  The  expansion  is  well- 
suited  for  sequential  implementation  using  the  GF(2m)  multiplier 
and  accumulator  shown  in  Figure  5.  In  order  to  compute  all  the  trans¬ 
forms  shown  in  Table  III,  the  residue  evaluator  must  be  capable  of 
operation  in  all  five  binary  extension  fields.  Using  our  standard 
symbol  notation,  equation  (3-9),  the  accumulator  is  easily  imple¬ 
mented  using  eight  two-input  Exclusive-OR  gates.  (The  padding  of 
zeros  for  fields  with  m<8  automatically  produces  the  correct  results.) 
The  multiplier  structure  chosen  for  use  in  the  residue  evaluator  is 
an  asynchronous  array  GF(2m)  multiplier  (described  in  appendix  A  of 
this  report).  This  multiplier  is  the  processing  bottleneck  within 
the  transform  section,  and  the  array-type  structure  offers  the  fastest 
multiplication  rates.  However,  a  penalty  is  paid  for  this  speed 
because  the  hardware  implementation  of  the  array-type  structure 
requires  the  maximum  number  of  gates  of  all  GF(2m)  multiplier  struc¬ 
ture  alternatives,  but  note  that  the  GF(2m)  multiplier  required  for 
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Figure  5.  Polynomial  Residue  Evaluator 


Che  residue  evaluator  section  is  used  to  perforin  the  only  extension- 
field  multiplications  in  the  transform  architecture.  Also,  it  will 
be  shown  that  exactly  the  same  GF(2m)  multiplier  structure  is  used 
to  implement  a  critical  portion  of  the  errata-location  section.  The 
careful  analysis  and  design  of  this  multiplier  results  in  a  functional 
module  whose  usefulness  overshadows  the  disadvantages  associated  with 
its  complexity. 

The  operation  of  the  polynomial  residue  evaluator  can  be  de¬ 
scribed  with  an  example.  To  calculate  a  point  in  the  transform,  the 
arithmetic  controller  supplies  the  polynomial  residue  calculator  with 
(1)  the  necessary  information  to  obtain  the  predetermined  residue 
polynomial,  (2)  the  degree  of  that  particular  residue  polynomial,  >, 
and  (3)  the  power  of  the  kernel  =  ap ,  at  which  the  residue  is  to 
be  evaluated.  The  residue  evaluator  is  required  to  implement  the 
expression 


R(»P)  =  Rq  +  R1aP  + 


+  Rc-ia 


( £ -l)p 


(3-10) 


The  coefficients  of  the  residue  polynomial  are  stored  in  a  temporary 
memory  within  the  residue  calculator.  The  information  provided  by 
the  arithmetic  controller  selects  the  appropriate  memory  locations, 
and  serially  feeds  these  coefficients,  most  significant  coefficient 
first,  into  the  multiplier  and  accumulator  circuitry.  The  input 
latch  (see  Figure  5)  is  initially  cleared  to  zero,  and  therefore  the 
output  of  the  programmable  GF(2m)  multiplier  is  also  zero.  The  point 
of  evaluation,  aP,  is  latched  into  the  evaluation  latch.  The  most 
significant  coefficient  of  the  residue  polynomial  is  fed  unchanged 
through  the  accumulator  and  latched  in  the  input  latch.  After  pro¬ 
cessing  delay,  the  output  of  the  asynchronous  multiplier  is  (R„_^aP). 
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This  output  is  fed  back  into  one  input  of  the  accumulator.  Simul¬ 
taneously,  the  next  most  significant  coefficient  is  retrieved  from 
the  temporary  memory  and  fed  to  the  accumulator.  The  next  output 
of  the  accumulator,  (R,_jaP  +  R?_2>,  is  held  in  the  input  latch  and 
asynchronously  multiplied  with  a  .  This  process  continues  (Z- 1) 
times  until  the  input  latch  contains 


(•••(R  ap  +  R£-2)ctP  +  R1^P  +  R0  =  R(aP)  0-11 ) 


which  is  the  evaluated  polynomial.  This  data,  or  transformed  symbol, 
is  latched  into  the  residue  evaluator's  output  latch  where  it  can  be 
shifted  out  of  the  transform  section  for  further  processing.  The 
entire  process  is  repeated  n  times  in  order  to  compute  an  n-point 
transform.  After  each  symbol  is  calculated,  the  input  latch  (Figure 
5)  must  be  cleared  to  zero. 

3. 2. 1.3  Arithmetic  Controller.  The  arithmetic  controller  provides 
all  timing  and  control  signals  required  to  operate  the  transform  sec¬ 
tion.  As  mentioned  previously,  the  calculations  implemented  by  both 
the  polynomial  residue  calculator  and  polynomial  residue  evaluator  are 
independent  of  whether  a  forward  or  an  inverse  transformation  is  per¬ 
formed.  The  arithmetic  controller  determines  the  order  of  the  evalu¬ 
ation  and  therefore  dictates  the  type  of  transform  to  be  computed. 

The  arithmetic  controller  requires  specific  input  data  to  pro¬ 
vide  management  for  the  transformer.  The  controller  needs  to  know 
whether  the  transformer  is  to  be  used  for  encoding  (forward  transform) 
or  decoding  (inverse  transform).  Also,  the  controller  needs  to  know 
which  code  is  being  processed  and  the  field  in  which  the  code  is 
defined.  From  this  information,  the  arithmetic  controller  generates 
the  order  of  computation  for  the  transform  and  its  kernel. 
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The  arithmetic  controller  provides  minimal  control  to  the  poly¬ 
nomial  residue  calculator.  For  operation  in  a  particular  field,  the 
controller  reconfigures  the  LFSRs  to  divide  by  all  the  minimal  poly¬ 
nomials  within  that  field.  The  controller  selects  the  shift  register's 
output  tap  (via  the  output  multiplexer)  for  each  divider  circuit  and 
the  controller  also  programs  the  necessary  feedback  taps.  The  con¬ 
trols  for  the  polynomial  residue  calculator  ere  static;  once  the  code 
and  field  of  operation  are  defined,  the  circuits  are  programmed  and 
they  remain  unchanged  for  the  duration  of  the  transform  calculation. 

Primarily,  the  arithmetic  controller  manages  the  polynomial 
residue  evaluator.  Once  the  field  of  operation  is  defined,  the 
controller  reconfigures  the  GF(2ra)  multiplier  to  operate  in  that 
field.  The  multiplier  remains  in  this  configuration  for  the  duration 
of  the  transform  calculation.  However,  for  each  point  in  the  trans¬ 
form,  the  controller  must  supply  the  residue  calculator  with  the  data 
required  to  select  the  predetermined  residue  polynomial.  The  con¬ 
troller  must  also  provide  the  degree  of  that  residue  as  well  as  the 
point  of  evaluation.  These  sets  of  control  signals  are  dynamic; 
they  change  for  the  calculation  of  each  transform  point. 

The  arithmetic  controller  could  be  implemented  with  any  of  a 
number  of  hardware  structures,  including  microprocessor  controlled 
hardware.  However,  high  throughput  in  the  transformer  warrants  a 
high-speed  controller.  The  controller  archil ecture  that  is  described 
in  the  following  paragraphs  was  implemented  in  the  (51, k)  encoder 
and  decoder  breadboard. 

The  major  function  that  must  be  implemented  by  the  controller 
is  the  dynamic  generation  of  the  data  required  to  calculate  each 
individual  transformed  symbol.  The  static  control  required  for  each 
code  can  be  easily  generated  with  discrete  combinational  logic.  A 
block  diagram  of  an  arithmetic  controller  is  shown  in  Figure  6.  The 


Figure  6.  Arithmetic  Controller 


inputs  for  the  controller  are  an  encode/decode  signal,  the  field  of 
operation  (lF(2m),  and  the  chosen  code  parameters  (n,k).  For  each 
transformed  symbol,  the  arithmetic  controller  generates  three  pieces 
of  information:  the  address  used  by  the  residue  evaluator's  input 
multiplexer  to  select  the  predetermined  residue  polynomial,  the  degree 
of  that  residue  polynomial,  and  the  field  element  at  which  the  poly- 
nomia’  is  to  be  evaluated.  The  controller  consists  of  preprogrammed 
memory  and  a  programmable  memory  address  generator.  The  memorv  is 
partitioned  into  two  separate  storage  areas  that  have  a  common  address. 
One  memory  section  contains  the  field  elements  associated  with  each 
field  and  the  other  contains  the  information  required  to  select  and 
evaluate  the  residue  polynomials.  Within  a  given  field,  a  particular 
element  is  a  root  of  only  one  minimal  polynomial.  Therefore,  for 
polynomial  evaluation  there  exists  a  one-to-one  relationship  between 
anv  field  element  and  its  associated  residue  polynomial.  Mien  a  field 
element  is  selected  in  one  memory  section,  the  data  associated  with 
its  residue  polynomial  is  selected  in  the  other  memory. 

There  are  2m-l  nonzero  field  elements  in  the  field  GF(2m). 

r  -1 

These  elements  can  be  designated  as  uJ,  u  ,  af  ,  ...,  a'  ‘  .  Our 
eight-bit  standard  symbol  representation  (equation  (3-9))  of  each 
element,  n 1 ,  from  GK(2m)  is  stored  in  memory  location  2m+i .  The 
field  element  from  GF(2m)  is  an  important  evaluation  point.  It 
is  stored  in  memory  location  2m  and  it  is  also  stored  in  the  memory 
location  2m  +  2m  -  1  or  -  1.  The  residue-polynomial  information 

corresponding  to  each  field  element  is  similarly  stored  in  the  second 
memory . 

The  programmable  memory  address  generator  consists  of  an  initial¬ 
ization  circuit,  a  transform  kernel  generator,  and  a  programmable  up- 
down  counter.  For  a  forward  n-point  transform  over  GF(2m),  the  field 

elements  required  for  evaluation  in  the  fast  polynomial  algorithm 
,0  K  2K  (n-l)K. 

are  (a  ,  i  ,  a  ,  ....  a  ).  The  memory  address  corresponding 
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to  these  field  elements  is  generated  by  initializing  the  up-down 
counter  to  the  memory  location  2™,  and  then  incrementing  the  counter 
by  K  for  each  point  in  the  transform.  For  an  inverse  n-point  trans¬ 
form  over  C.F(2m)  ,  the  field  elements  required  for  evaluation  are 

,  0  (n-l)K  (n-2)K  2K  K.  . 

(a, a  ,  a  ,  .  . .  ,  a  ,  a  ).  The  memory  address  corresponding 

to  these  elements  are  generated  by  initializing  the  up-down  counter 

to  memory  location  2m+^-l  and  then  decrementing  the  counter  by  K  for 

each  point  in  the  transform. 

3.2.2  The  Errata-Location  Section 

The  errata-locat ion  section  implements  a  modified  version  of  the 
Ber lekamp-Massey  minimal  length  shift  register  synthesis  algorithm 
to  correct  symbol  errors  and  symbol  erasures.  First,  this  section 
uses  the  known  erasure  locations  to  calculate  the  erasure-locator 
polynomial.  Then,  the  same  hardware  uses  the  error  syndrome  values 
to  iteratively  calculate  the  errata-locator  polynomial.  Finally, 
the  errata-locator  polynomial  is  used  to  generate  the  transform  of 
the  errata  pattern  which  is  subtracted  from  the  transform  of  the 
received  codeword  in  order  to  recover  the  original  message. 

The  modified  Ber lekamp-Massey  decoding  algorithm  was  presented 
in  Volume  1  and  was  reviewed  in  section  II  of  this  volume.  In  order 
to  describe  the  hardware  required  to  implement  the  errata-locat ion 
section  we  define  the  decoding  algorithm  as  a  step-by-step  procedure 
and  ther.  describe  the  implementation  of  each  computation.  This 
detailed  decoding  procedure  is  shown  in  Figure  7  and  uses  the  notation 
defined  in  Table  VII.  The  procedure  of  Figure  7  uses  a  forward  trans¬ 
form  for  decoding  (an  inverse  transform  is  defined  for  encoding)  ,  and 
the  error  syndrome  symbols  are  defined  as  the  first  n-k  symbols  in 
the  transform  of  the  received  sequence.  (During  encoding,  the  k 
information  symbols  are  the  k  highest  coefficients  in  the  (n-l)th 
degree  message  polynomial.) 
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Erasure  locations 


Figure  7:  Decoding  Algorithm 


Table  VII 


Decoding  Algorithm  Variables 
(Notation) 


Volume  I 
Notation 

Volume  II,  Figure  7 
Notation 

Description 

z 

x  =  z_1 

Change  in  Variable 

dr 

d(N) 

Discrepancy 

dm 

Previous  Discrepancy 

„<r+1)<*> 

»<N)<*> 

Present  Connection 
Polynomial 

«(m)u> 

8<N)(x) 

Previous  Connection 
Polynomial 

L 

l<n)« 

Length  of  Present 

Connection  Polynomial 

•s* 

Xi 

rt 

Known  Erasure  Locations 
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The  decoding  algorithm  shown  in  Figure  7  is  required  to  operate 
for  n  iterations.  During  the  first  o  iterations,  the  known  erasure 
locations  are  used  to  construct  sequentially  the  erasure-locator 
polynomial.  When  the  algorithm  first  branches  to  step  (6),  the  poly¬ 
nomials  /\/V  ^(x)  and  B(V  ^  (x)  are  both  the  erasure-locator  poly¬ 
nomial.  At  the  conclusion  of  the  (n-k)th  iteration,  the  polynomial 
A^n  k  ^ (x)  is  the  errata-locator  polynomial.  This  polynomial  is 
held  constant  for  the  remainder  of  the  algorithm  and  the  generated 

set  {NL,  M. ,  .  .  . ,  M,  ,  }  is  the  recovered  information. 

O’  1  k-1 

A  block  diagram  of  the  errata-locat ion  section  is  shown  in 
Figure  8.  The  decision  and  control  circuitry  implied  by  the  decoding 
algorithm  are  not  shown  in  this  figure  but  are  implicit  in  the 
circuit  operation.  Both  the  algorithm  presented  in  Figure  7  and 
the  structure  shown  in  Figure  8  are  independent  of  the  field  of 
operation;  the  arithmetic  operations  specified  in  the  algorithm  and 
implemented  in  the  block  diagram  are  implicitly  field-dependent. 


Three  major  computational  steps  implement  the  decoding  algorithm 
as  shown  in  Figure  7: 

•  The  first  computation  corresponds  to  step  (6)  and  is  the 

(N) 

calculation  of  the  present  discrepancy,  d 

•  The  second  calculation  corresponds  to  steps  (3)  and  (8) 
and  is  the  calculation  of  the  present  feedback  connection 
polynomial,  A^^(x). 


•  The  remaining  calculation  updates  the  previous  feedback 

i(N), 


connection  polynomial 


(x)  to  one  of  three  values; 


„ (N-l )  ,  .  , (N-l)  ,  ,  . (N),  , 

xB  (x)  ,  A  (x)  ,  or  A  (x) . 


The  present  discrepancy  is  always  calculated  prior  to  calculating  the 
present  feedback  connection  polynomial  which  in  turn  is  calculated 
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TRANSFORMED 


Figure  8.  Errata  Location  Section 


before  updating  the  previous  feedback  connection  polynomial.  The 
hardware  that  implements  these  calculations  is  described  in  the 
remainder  of  this  section. 

(N) 

3.2.2.  1  Calculation  of  d _ .  During  eacli  of  the  first  n-k  iterations 

(N) 

of  the  decoding  algorithm,  the  present  discrepancy  d  must  be 

generated.  For  the  first  v  iterations,  the  known  erasure  locations, 

(N) 

V  ,  are  substituted  for  d  .  The  erasure  locations  serve  as  inputs 
to  the  present  discrepancy  latch  (see  Figure  81.  Once  all  erasure 
locations  have  been  used  as  inputs,  the  present  discrepancy  is  cal¬ 
culated  as 


,(N) 


=  S. 


(N-ll 

'£ 

i=l 


a(n'°  S„  . 

l  N- 1 


(3-121 


(M_  i ) 

which  corresponds  to  the  convolution  of  the  L  +  v  +  1  most 

recent  syndrome  values  {S„,  S„  ,,  ...,  (N-l)  }  and  the  coeffi- 

cients  of  the  present  feedback-connection  polynomial  A ^  tx) .  The 

portion  of  the  structure  shown  in  Figure  8  that  is  used  to  implement 

this  correlation  is  repeated  in  Figure  9.  In  this  figure,  the 
(N-l) 

L  +  v  +  1  most  recent  syndrome  values  are  held  in  the  syndrome 

register,  while  the  coefficients  of  ^  (x)  are  held  in  the  present 

feedback-polynomial  register.  Each  of  the  syndrome  values 

is  multiplied  with  a  corresponding  coefficient  of  the  feedback 

connection  polynomial  and  the  ^  +  v  +  1  product  terms  are 

(N) 

summed  to  produce  d  .  Each  of  the  pairwise  products  between  the 
syndrome  register  and  the  connection  polynomial  register  is  a  multi¬ 
plication  in  GF(2m),  and  ^  +  v  +  1  field-dependent  multipliers 

are  required  for  this  direct  implementation.  The  structure  shown  in 

Figure  9  illustrates  the  computational  steps  required  to  calculate 

(N) 

d  ;  the  hardware  is  complex  and  may  be  simplified. 
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Figure  9.  Present  Discrepancy  Calculator 


The  (255, k)  encoder  and  decoder's  present  discrepancy  calculator 
is  shown  in  Figure  10.  This  structure  implements  sequential  finite- 
field  multiplication  so  that  only  one  C.F(2m)  multiplier  structure  is 
required.  In  Figure  10,  each  syndrome  value  and  polynomial  coefficient 
is  represented  using  our  standard  eight-bit  symbol  notation,  (equation 
3-9).  The  calculation  of  the  present  discrepancy,  equation  (3-12), 
can  be  expanded  as 


(N-l )  7 

L  +v  7 


E  E  Vi  7-k  £  E, 

t=l  k-0  l'  \  1=0  ’ 


7  (N-l)  ,\ 

E  '■ ;  -  x'  )  x  mod  p(x) 


(3-1 3a) 


=  SX  + 


7  l.(X  1}+v  /  7  (N-l )  \  . 

E  E  vi7-k  ZE 

k=0  i=0  ’  \?=0  ’  / 


mod  p(x) 


(3-1 3b) 


where  p(x)  is  the  primitive  polynomial  that  generates  GF(2  ),  S,.  ^  ^ 

is  the  k-th  bit  in  the  eight-bit  representation  of  S  .  and 

(N-l)  (N-l) 

A ^  ?  is  the  f-th  bit  in  the  eight-bit  representation  of  A^ 

This  equation  is  implemented  directly  in  Figure  10  and  this  structure 

can  be  described  in  terms  of  its  operation.  During  the  first  n-k 

iterations  of  the  decoding  algorithm,  the  syndrome  values  are  used 
(N) 

to  calculate  d  and  switch  1  in  Figure  10  is  in  position  "1". 

During  the  N-th  iteration  (N  <  n-k),  is  fed  sequentially  into  the 
present  discrepancy  calculator.  Simultaneously,  each  syndrome  value 
already  present  in  the  syndrome  register  is  shifted  one  stage  to  the 
right.  Each  svndrome  value,  S.,  is  stored  so  that  S.  _  is  the  first 
bit  shifted  out.  There  are  eight  shifts  required  to  shift  each 
syndrome  value  and  these  shifts  correspond  to  the  summation  over  k  in 
equation  (3-1 3b).  During  each  shift,  a  single  bit  of  each  syndrome 
value  is  multiplied  modulo-2  with  every  bit  from  the  corresponding 
pairwise  product-polynomial  coefficient  A  fE  ^ ,  <1=0,  1,  ...,  7,  and 

i.  >  X 

the  eight  partial  products,  P.  n,  P,  .,  ...  P  7,  from  each  of  the 

ljvJ  JL  >  _L  -L»' 


^  +  v  stages  are  fed  into  the  accumulator.  For  k=0,  the  summa¬ 


tion 


l.(N'_1)+v 


/  7  (N-l) 

E  sNW7  L  Ax  f  >-■ 

i  =  1  A  1  ’  '  \  .  =0  ’ 


(3-1 4a) 


1=1 


7  /L(N'1)+v 


E  E  sN_  A 

3^0  V  i=o  ’ 


(N-l)' 


( 3-1 4b) 


is  formed  in  the  serial  multiplier's  accumulator.  Equation  ( 3-1 Ab) 
represents  a  polynomial  whose  1 1 h  coefficient  is  given  by 


1 (N-i )+y 


(N-l) 

A. 


N-i , 7  i,e 


(3-15) 


The  eight  coefficients  Pg,  Pj,  ...  ,  P^  are  fed  into  the 
GF(2m)  serial  multiplier  where  multiplication  with  x  and  polynomial 
reduction  modulo  p(x)  occur.  This  product  is  stored  in  the  multi¬ 
plier's  output  latch.  (A  description  of  the  GFv.i  )  serial  multiplier 
is  given  in  appendix  A  of  this  volume).  During  the  k=l  shift,  the 
summation 


L (N-i )+v 


i=0 


N-i, 6 


E  \ 

) -n  1  ’ 


(N-l)' 


3=0 


L(N-1}+v 


i=l 


3N-i , 7 


7  (N-l) 

E  Ai  e 

e=o  ’ 


(3-16) 


x  mod  p(x) 
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is  formed  in  the  serial  multiplier's  accumulator.  This  term  is  fed 

to  the  GF(2m)  serial  multiplier,  and  the  output  is  stored  in  the  multi- 

m 

plier  latch.  This  process  continues  and  after  seven  shifts  the  GF(2  ) 
serial  multiplier  latch  contains  the  term 


6 

£ 


k=0 


l(n-1} 

£ 


+  v 

SN-i,7-k 


(N-l) 
i ,  i 


e 


X 


7-k 

x 


mod  p(x) 


(3-17) 


During  the  k=7  shift,  the  contents  of  the  multiplier  latch,  equation 
(3-17),  are  accumulated  with 


L 


(N-l) 


£ 

i=i 


+v 

SN-i,0 


(N-l) 

i,i 


(3-18) 


and  the  sum  is  fed  to  the  discrepancy  accumulator,  where  is  added 
and  the  present  discrepancy  is  formed.  The  architecture  shown  in 
Figure  10  requires  only  binary,  GF(2),  logic  for  each  of  the  l/N  ^+v 
convolver  stages.  All  field-dependent  operations  are  implemented 
in  the  single  GF(2m)  serial  multiplier. 

3. 2. 2. 2  Calculation  of  the  Present  Feedback  Connection  Polynomial, 
(N) 

A  (x) .  During  each  of  the  first  n-k  iterations  of  the 

decoding  algorithm,  the  present  feedback  connection  polynomial 
(N) 

A  (x)  must  be  updated.  This  polynomial  can  be  revised  to  one  of 
two  values. 


or 


A(N)(x)  -  A(N~1)(x) 


A(N)(x)  =  A^x)  -  d^b'^xB^tx) 


(3-19a) 

(3-19b) 
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(N) 

The  portion  of  Figure  8  that  updates  A  (x)  is  repeated  in  Figure  11. 
This  structure  consists  of  the  present  connection  polynomial  register, 
the  previous  connection  polynomial  register,  the  present  and  past 
discrepancy  latches,  a  field-element  inversion  circuit  and  a  field- 
element  multiplier. 

The  operation  of  the  structure  shown  in  Figure  11  can  be  de¬ 
scribed  for  both  possible  update  conditions  shown  in  equations  (3-19). 

The  first  equation  is  trivial  to  implement.  Prior  to  the  N-th 

(N-l) 

iteration,  the  polynomial  A  (x)  is  stored  in  the  present  feedback 

polynomial  register.  If  the  conditions  of  the  decoding  algorithm  are 
such  that  ^(x)  is  revised  in  accordance  with  equation  (3-19a)  then 
the  contents  of  the  present  feedback  polynomial  register  are  not 
changed.  The  implementation  of  equation  (3-19b)  is  more  complex  and 
it  is  best  described  by  a  three  step  procedure.  First,  the  stored 
previous  discrepancy  b^N  ^  is  inverted  and  multiplied  with  the  pre¬ 
sent  discrepancy  d^N\  The  inversion  and  multiplication  are  CF(2m) 
operations  and  they  are  implemented  using  the  field-element  inversion 
and  (lF(2m)  multiplier  shown  in  Figure  11.  Next,  the  product 
d  (N )  b  ^is  multiplied  with  xg^N  ^(x).  The  polynomial  P  ^  " *  (x) 

is  stored  in  the  previous  feedback  polynomial  register  and  x8^  ^(x) 
is  formed  by  shifting  each  coefficient  of  8^  ^(x)  one  stage  to  the 
right.  The  product 


0-20) 


is  formed  by  multiplying  the  term  d^^b  ^  ^  with  each  coefficient 


of  the  shifted  version  of 


S^N  1} (x)  using  the  corresponding  GF(2m) 


multiplier.  Finally  equation  (3-19b)  requires  that  the  polynomial 
formed  in  equation  (3-20)  be  subtracted  from  the  polynomial  A^N  ^ (x) , 


Figure  11.  Present  Feedback  Connection  Polynomial  Calculator 


Since  the  coefficients  of  both  polynomials  are  from  GF(2m), 

the  arithmetic  operation  of  subtraction  is  equivalent  to  modulo- 

two  addition.  Therefore,  the  desired  results  can  be  obtained  as 

a  coefficient-by-coefficient  modulo-two  addition  of  the  two  poly¬ 
pi 

nomials.  The  resulting  polynomial  A  (x)  is  then  stored  in  the 
present  feedback  connection  polynomial  register. 

The  diagram  of  Figure  11  illustrates  the  computational  steps 
required  to  update  the  present  feedback  connection  polynomial.  How¬ 
ever,  this  structure  requires  ^  +  v  +  2  GF(2m)  multipliers, 
making  its  hardware  implementation  complex.  The  expanded  diagram 
shown  in  Figure  12  is  functionally  equivalent  to  the  structure  shown 
in  Figure  11,  but  the  structure  of  Figure  12  which  uses  sequential 
operation  on  symbol  hits  has  been  designed  so  that  only  two  GF(2m) 
multipliers  are  used. 

The  first  field-dependent  structure  calculates  d^b 

This  structure  uses  an  inversion-by-squared  product  algorithm  to 

calculate  the  multiplicative  inverse  of  b^  and  then  the  same 

(jn  _(n-i) 

structure  calculates  d  b  .  The  heart  of  this  circuit  is  a 

programmable  GF(2m)  array  multiplier  that  is  identical  to  the  array 
multiplier  designed  for  the  transform  section.  This  multiplier  is 
combined  with  latches  and  field-element  squaring  circuitry  to  calcu¬ 
late  b  ^  ^ .  The  same  array  multiplier  is  then  used  to  calculate 
the  product  d^^b  ^  The  field-element  inversion  algorithm  and 

the  details  associated  with  the  programmable  arrav  multiplier  and  the 
field-element  squaring  circuit  are  contained  in  appendix  A  of  this 
volume . 

The  second  field-dependent  portion  of  Figure  12  is  designed  to 
simultaneously  implement  the  product  shown  in  equation  (3-20)  and  the 
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i 


coef ficient-by-coef f icient  summation 


«>  ,  ,«-»  5<m6(»-i> 

1  1  1-1 


(3-21) 


where 


d(»)b-(N-U 


This  equation  can  be  expanded  as 


A(N-D  + 
i 


7 


E 

3=0 


(N- 

i-1. 


(  E  <skN'>xk )  xJ  ni0'’  p(x) 

k=0 

(3-22) 


where  B^."^  is  the  j-th  bit  in  our  eight-bit  representation 

1-i»3  (N-l) 

(equation  3-9)  of  the  (i-l)th  coefficient  from  0  (x)  and  p(x) 

is  the  primitive  polynomial  that  generates  the  field.  The  structure 

shown  in  Figure  12  implements  eque  .ion  (3-22)  directly  with  t he 

summation  over  j  implemented  sequentially  in  time.  For  each  j  the 

term 


(3-2  1 ) 


is  multiplied  with  every  coefficient  of  8^  ^(x).  The  i-th  stage 
of  the  present  feedback  connection  polynomial  register  contains  our 
eight-bit  representation  of  ^ .  For  each  j,  a  bit  from  the  eight 
bit  representation  of  is  shifted  one  stage  to  the  right  within 

the  previous  feedback  connection  nolvnomial  register.  (These  coeffi¬ 
cients  are  stored  so  that  the  first  bit  shifted  out  is  0  N  *h) 

(N-l)1-l,U 

Simultaneously,  with  each  shift  the  j-th  bit  from  B^  ^  is  multiplied 

modulo-two  with  the  value  shown  in  equation  (3-23)  and  the  product 

Is  accumulated  with  This  modulo-two  multiplication  and 

accumulation  is  implemented  bv  using  the  bits  as  clocking 

i  “  i  >1 

f>,ates  for  the  single-bit  accumulators  that  are  the  building  blocks  of 


the  present  feedback  connection  polynomial  (see  Figure  12).  After 
eight  total  shifts,  the  present  feedback  connection  polynomial  has 
been  updated  in  accordance  with  equation  (3-19b)  and  stored  in  the 
present  feedback  connection  polynomial  register.  Also  after  eight 
shifts,  the  previous  connection  polynomial  register  contains  xg  ^  ^  (x) . 


An  example  illustrates  the  operation  of  this  structure  (refer 

to  Figure  12).  For  j=0,  6^^  is  held  in  the  GF(2m)  serial  multiplier 

latch.  The  contents  of  this  latch  are  fed  to  every  stage  of  the 

present  feedback  polynomial  register.  The  contents  of  the  i-th  stage 

of  the  present  feedback  polynomial  register  is  A^N  ^ .  During  the 

j=0  shift,  the  bit  6^-^q  is  shifted  from  the  (i-l)th  stage  to  the 

i-th  stage  of  the  previous  feedback  connection  polynomial  register. 

(1-1) 

Simultaneously  this  bit,  g^  gates  the  clock  for  the  i-th  stage 
of  the  present  feedback  polynomial  register.  The  summation 


(N-l)  (N)  (N-l) 

Ai  +  d  gi-l,0 


(3-24) 


is  calculated  and  stored  in  place  of  A^N  *  .  During  the  j=l  shift, 

the  contents  of  the  nv(2m)  serial  multiplier  latch  are  fed  to  the 

(N)  . 

multiplier.  The  product  6  x  mod  p(x)  is  formed  and  stored  m 

the  multiplier  latch.  The  bit  is  shifted  in  the  previous 

1  ljl 

feedback  connection  polynomial  register  and  gated  with  the  clock  for 
the  i-th  stage  of  the  present  feedback  connection  polynomial  register. 
The  summation 


(N-l)  (N)  (N-l) 

i  +  6  6i-l,0 


(6(N)x  mod  p(x))BfN.1J 

1“*I  ,  1 


(3-25) 


is  formed  and  stored  in  the  i-th  stage  of  the  present  feedback  register. 

(N) 

This  process  continues  for  eight  total  shifts  and  A  (x)  is  calculated 
in  accordance  with  equation  (3-1%). 
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3. 2. 2. 3  Calculation  of  the  Previous  Feedback  Connection  Polynomial, 

(N) 

3  (x) .  During  each  of  the  first  n-k  iterations  of  the 

(N) 

decoding  algorithm,  the  previous  feedback  connection  polynomial  3  (x) 

must  be  updated.  The  polynomial  can  be  revised  to  one  of  three  values 
as  indicated  in  equation  (3-26). 


6w 

6(»> 

em 


M  -  x 
(x)  =  A 
(x)  =  A 


g(N-D  (x) 


(N-l ) 


(x) 


(N) 


(x) 


( 3-26a) 
(3-26b) 
(3-26c) 


(N) 

The  conditions  for  determining  the  revised  value  of  3  (x)  depend 

(N) 

upon  the  calculation  of  the  present  discrepancy,  d  ,  and  the 

(N) 

revision  of  the  present  feedback  connection  polynomial,  A  (x) .  The 

(N) 

procedure  for  revising  3  (x)  is  carried  out  only  after  the  other  two 

calculations  have  been  concluded.  The  hardware  required  for  computing 
(N) 

8  (x)  consists  of  the  two  feedback  connection  polynomial  registers 

and  the  temporary  polynomial  register  (see  Figure  8).  The  revision 

(N)  (N) 

of  8  (x)  is  closely  related  to  the  calculation  of  A  (x) .  During 

each  iteration  of  the  decoding  algorithm,  ^  (x)  is  copied  into 

(N)  (N) 

the  temporary  polynomial  register,  and  d  and  A  (x)  are  then 

calculated.  Temporary  memory  is  required  because  the  contents  of  the 

pre  sent  connection  polynomial  register,  ^  (x) ,  may  be  altered 

(N)  (N) 

during  the  calculation  of  A  (x).  After  A  (x)  has  been  calculated, 

the  previous  connection  polynomial  register  contains  xB^  ^(x).  If 

(N) 

the  conditions  of  the  decoding  algorithm  are  such  that  3  (x)  is 

updated  in  accordance  with  equation  (3-26a)  then  the  revision  is 
complete.  If  the  conditions  of  the  algorithm  are  such  that  equation 
(3-26b)  is  valid,  then  the  contents  of  the  temporary  memory  are  trans¬ 
ferred  into  the  previous  feedback  connection  polynomial  register  and 

0^(x)  is  equal  to  ^(x).  Finally,  if  equation  (3-26c)  is  to  he 
(N) 

implemented,  A  (x)  is  transferred  through  the  temporary  memory  into 
the  previous  feedback  connection  polynomial  register. 
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3. 2. 2. 4  Symbol  Errata  Correction.  The  lecoding  algorithm  shown  in 
Figure  7  requires  n-k  iterations  to  compute  the  errata-locator  poly¬ 
nomial.  If  the  total  number  of  errors  and  erasures  is  within  the 
bound  of  the  code  (equation  3-1) ,  then  the  synthesized  errata-locator 
polynomial  is  unique.  The  synthesized  errata- location  section  will 
sequentially  generate  the  transform  of  the  errata  pattern  which  is 
then  subtracted  from  the  transform  of  the  received  sequence  to  obtain 
the  original  message. 


Errata  correction  and  information  recovery  occur  during  the  last 
k  iterations  of  the  decoding  algorithm.  Step  (10)  through  (13)  in 
the  algorithm  (see  Figure  7)  need  to  be  implemented.  The  algorithm 
branches  to  step  (10)  after  n-k  iterations.  At  this  time,  the  present 
feedback  connection  polynomial,  A^n  k  ^ (x) ,  is  the  synthesized 
errata-locator  polynomial.  For  N  n-k,  step  (11)  of  the  decoding 
algorithm  is 


L(N-!)+v 


SN  = 


L 

i=l 


a;n_i)  • 

i  N-i 


(3-27) 


Equation  (3-27)  defines  the  calculation  required  to  generate  the  next 

symbol  in  the  transform  of  the  errata  pattern.  In  step  (12) ,  this 

symbol  is  subtracted  (added  modulo-two)  from  the  N-th  symbol  in  the 

transform  of  the  received  sequence;  the  original  information  symbol 

is  recovered.  During  the  last  k  iterations  of  the  decoding  algorithm, 

(N) 

the  present  feedback  connection  polynomial,  A  (x) ,  remains  unchanged. 

The  hardware  that  implements  information  recovery  in  the  decoding 
algorithm  is  contained  within  the  present  discrepancy  calculator  as 
seen  in  Figure  10.  The  single  switch  controls  the  operation  of  the 
hardware.  During  the  first  i-k  iterations  of  the  algorithm,  switch 
1  is  in  position  "1",  and  the  syndrome  values  are  fed  into  the  syndrome 
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register.  During  the  last  k  iterations  of  the  algorithm,  switch  1 
is  in  position  "2"  and  the  input  into  the  syndrome  register  is  in 
accordance  with  equation  (3-27). 

During  the  N-th  iteration,  (N>n-k) ,  is  calculated  in  the  serial 
GF(2m)  multiplier's  accumulator  (see  Figure  10).  This  calculation 
requires  eight  shifts  and  is  implemented  identically  as  the  calculation 
of  d (see  section  3. 2. 2.1).  Also  during  the  N-th  iteration,  the 
N-th  symbol  in\the  transform  of  the  received  pattern,  R^,  is  entered 
into  the  present  discrepancy  calculator.  On  the  eighth  shift  of  the 

N-th  iteration,  SN  (as  calculated  in  equation  (3-27))  is  added  modulo- 
(N) 

two  to  in  the  d  accumulator.  This  accumulator  implements 


+  SN  •  ",  (3-28) 

and  an  original  information  symbol  is  recovered.  This  process  is 
repeated  for  k  cycles;  the  entire  original  message  is  recovered. 

3. 3  Operational  Characteristics 

The  Reed-Solomon  (255, k)  encoder  and  decoder  is  designed  to 
operate  continuously  in  a  serial  input  data  mode.  The  processing 
time  required  for  both  encoding  and  decoding  can  be  described  in  terms 
of  the  operation  of  the  transform  section  and  the  errata- location 
section.  To  facilitate  this  description  it  is  advantageous  to  define 
a  machine  cycle  as  the  maximum  time  required  for  the  encoder  and 
decoder  to  complete  one  cycle  of  the  iterative  decoding  algorithm 
(see  Figure  7).  Figure  13  shows  the  tiir.  ig  requirements  associated 
with  the  operation  of  the  transform  and  errata-location  sections. 

The  sequential  calculation  of  an  n-point  transform  requires  2n 
machine  cycles.  During  each  of  the  first  n  machine  cycles,  a  symbol 
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Figure  13.  Timing  Requirements  for  (255,  k)  Encoder  and  Decoder 


from  the  sequence  to  be  transformed  is  fed  into  the  transform  section’s 
polynomial  residue  calculator.  At  tiie  conclusion  of  n  machine  cycles, 
all  n  symbols  of  the  sequence  will  have  been  used  as  inputs  and  the 
transform  algorithm's  polynomial  division  will  be  complete.  At  this 
time,  the  polynomial  residues  that  have  been  calculated  are  transferred 
into  a  temporary  memory  so  that  they  are  available  for  further  pro¬ 
cessing.  The  second  n  cycles  define  the  evaluation  period.  During 
each  of  these  machine  cycles,  a  residue  polynomial  is  selected  from 
the  residue  calculator's  temporary  memory  and  evaluated  to  produce  a 
single  transform  value.  The  evaluation  process  is  calculated  within 
the  transform  section's  residue  evaluator,  and  the  i-th  transformed 
symbol  becomes  available  at  the  conclusion  of  the  i-th  machine  cycle 
of  the  evaluation  period.  The  transform  section  is  a  pipeline  in 
which  two  adjacent  blocks  of  n  symbols  are  In  process  at  all  times. 

"he  errata-location  and  symbol  correction  sections  also  operate 
with  sequential  symbols.  During  decoding,  the  errata-locator  uses  the 
first  v  transformed  symbols  to  calculate  the  erasure-locator  polynomial 
Then  the  errata-locator  uses  the  next  n-k-v  symbols  to  synthesize  the 
errata-locator  polynomial.  The  symbol  correction  circuitry  uses  the 
synthesized  polynomial  to  correct  the  remaining  k  transformed  symbols. 
Polynomial  synthesis  and  symbol  correction  require  n  machine  cycles. 

The  sequential  symbols  required  as  input  to  the  errata-locator  are 
available  at  the  completion  of  each  machine  cycle  in  the  transform 
section's  evaluation  period.  The  n  machine  cycles  associated  with 
the  operation  of  the  errata-location  section  are  offset  one  machine 
cycle  from  the  n  cycles  that  constitute  the  transform's  evaluation 
period.  The  total  time  required  to  decode  an  (n,k)  code  requires 
2n  +  l  machine  cycles.  In  a  continuous  data  mode,  the  transform 
section  does  not  wait  until  it  processes  one  block  of  data  before  it 
starts  on  the  next  one  so,  after  an  initial  delay  of  n  +  1  machine  eve] 
a  block  of  decoded  symbols  becomes  available  every  n  machine  cycles. 
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A  single  machine  cycle  is  defined  as  the  total  number  of  clocking 
cycles  required  to  implement  the  computational  steps  in  a  single 
iteration  of  the  decoding  algorithm.  Figure  14  shows  the  relationship 
between  a  clock  cycle  and  a  machine  cycle.  The  timing  required  to 
calculate  the  intermediate  steps  in  the  decoding  algorithm  is  also 
shown  in  this  figure.  Finite-field  multiplication  is  implemented 
within  a  single  clock  cycle.  L’sing  the  programmable  array  multiplier 
structure  described  in  appendix  A,  we  can  easily  obtain  mul t ipl icat ion 
rates  of  less  than  100  nanoseconds  using  standard  Schottkv  TTL  logic. 

A  VLSI  implementation  of  the  array  multiplier  can  probably  achieve  50 
nanosecond  multiplication  times,  corresponding  to  a  clocking  rate  of 
20  MHz.  Tw'enty  clock  cycles  are  required  to  represent  one  machine 
cycle.  The  (255, k)  Reed-Solomon  decoder  can  completely  decode  an 
(n,k)  code  in  (2n  +  1)  microseconds.  With  continuous  operation  a 
completely  decoded  block  from  an  (n,k)  code  would  be  available  every 
n  microseconds.  For  example,  the  (255, k)  Read-Solomon  decoder  can 
decode  a  codeword  from  a  (31,151  code  in  63  microseconds  using  a 
projected  20  MHz  clock.  In  a  continuous  mode,  a  decoded  codeword 
would  be  available  every  31  microseconds. 

3 . 4  Hardware  Complexity 

The  transform  section  and  the  errata-location  section  each  could 
be  fabricated  as  a  single  VLSI  device.  Much  of  the  circuitry  required 
for  the  transformer  and  er rata-locator  is  highly  repetitive,  and  both 
sections  share  functional  circuits  that  can  be  designed  once  and  then 
repeated . 

Most  of  the  circuitry  required  for  the  transformer  is  devoted 
to  the  implementation  of  the  polynomial  residue  calculator.  This 
structure,  consisting  of  35  divider  circuits,  can  be  designed  using 
a  macrocell  with  one  bit  of  shift  register,  one  bit  of  temporary 
memory,  and  eight  elementary  logic  gates  (see  Figure  15).  The  macro¬ 
cell  represents  a  single  programmable  stage  from  a  BFSR,  and  the  tem- 
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porary  memory  required  to  operate  the  transformer  in  a  pipeline  fashion. 
Approximately  2.4k  of  these  macrocells  are  required  to  implement  the 
residue  calculator.  An  additional  lk  elementary  logic  gates  are 
required  to  select  the  outputs  of  the  divider  circuits  and  to  provide 
programmability  for  the  different  feedback  connection  polynomials.  Due 
to  the  large  number  of  shift  register  stages  required  to  implement 
this  section,  the  gate  complexity  will  be  heavily  dependent  on  device 
technology.  However,  the  design  should  be  obtainable  using  current 
NMOS  or  other  mature  technologies. 

The  transformer's  polynomial  residue  evaluator  can  be  implemented 
with  fewer  than  lk  logic  gates.  The  accumulator  portion  of  this 
section  requires  fewer  than  100  gates,  and  the  reconf igurable  multi¬ 
plier  has  been  designed  to  be  implemented  with  no  more  than  900  gates. 
All  of  the  preceding  complexity  estimates  are  based  upon  a  direct 
implementation  with  two-input  NAND  or  NOR  gates. 

The  transform  section's  arithmetic  controller  could  be  fabricated 
on  the  same  integrated  circuit  as  the  residue  calculator  and  the  res¬ 
idue  evaluator.  Alternatively,  the  controller  also  could  be  imple¬ 
mented  easily  on  a  separate  MSI  chip  containing  a  modest  amount  of 
programmable  read-only  memory  [10].  A  custom  or  semi-custom  LSI 
implementation  of  the  entire  transformer  would  require  several  inte¬ 
grated  circuit.1;. 

The  architecture  associated  with  the  implementation  of  the  de¬ 
coding  algorithm  is  shown  in  Figure  16.  The  hardware  resembles  an 
adaptive  transversal  filter.  Reconfigurability  for  different  code 
parameters  is  accomplished  by  separating  the  binary-extension  field 
operations  from  other  binary  operations.  As  a  result,  most  of  the 
errata-location  section  is  configured  as  a  binary  transversal  filter 
(or  convolver),  and  the  remaining  portion  is  reconf igurable  to  accom¬ 
modate  the  necessary  field-dependent  operations. 
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Figure  16.  Errata  Location  Section  Architecture 


Most  of  the  circuitry  used  for  the  errata-location  section  is 
dedicated  to  implementing  the  binary  stages  of  the  transversal  filter. 
This  circuitry  is  highly  repetitive  and  benefits  from  the  modularity 
and  common  busir.g  struct!  res  inherent  in  VLSI  architectures.  The 
binary  filter  consists  of  128  identical  slices  of  hardware,  favoring 
macrocell  design.  Each  of  the  128  slices  consists  of  32  bits  of  shift 
register  and  approximately  75  additional  logic  gates.  Figure  17  is 
a  logic  diagram  of  a  single  slice  of  the  128  stage  filter.  Within 
each  slice,  a  cell  can  be  identified  that  consists  of  four  bits  of 
shift  register  and  approximately  eight  logic  gates.  This  cellular 

design  can  be  repeated  to  implement  the  binary  transversal  filter. 

The  entire  128- stage  filter  consists  of  approximately  4k  bits  of  shift 
register  and  10k  bits  of  additional  logic  gates. 

The  field-dependent  portion  of  the  errata-locator  consists  of 
field-element  inversion  circuitry,  a  present  discrepancy  calculator, 
two  reconf igurable  GE(2  )  serial  multipliers,  and  necessary  control 
logic.  The  most  complex  component  of  these  structures  is  the  field- 
element  inversion  circuitry.  The  heart  of  this  structure  is  a  pro¬ 
grammable  GF(2m)  array  multiplier  that  is  identical  to  the  structure 
required  in  the  transformer's  polynomial  residue  evaluator.  The  entire 
field-dependent  portion,  excluding  control,  consists  of  less  than  2k 
logic  gates.  Again,  control  logic  could  be  implemented  in  the  same 
integrated  circuit,  or  a  separate  device  could  be  designed. 

Both  tlie  transformer  and  the  errata-locator  have  hardware  com¬ 
plexities  that  suggest  fabrication  as  single  VLSI  devices.  Table  V T 1  I 
summarizes  the  approximate  hardware  complexity  and  features  associated 
with  the  transformer  and  errata-locator.  Each  of  the  sections  could 
easily  be  implemented  as  a  set  of  LST  devices  where  manv  of  the  devices 
would  be  identical. 
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Figure  17.  8-Bit  Symbol  Correction  Slice 


Table  VIII 


(255,  k)  Encoder  and  Decoder  Hardware  Complexity 


Function 

Architecture 

Complexity 

Transformer 

Programmable  Over 

4.5k  bits  of 

GF(2m)  m  =  4, 5, 6,7, 8 

shift  register 

Repetitive  Structure 

13k  gates 

Accommodate  588  codes 

Errata  Locator 

Programmable  over 

4.5k  bits  of 

OF ( 2m)  m=  4, 5, 6, 7, 8 

shift  register 

Correct  2t  +  s  <  128 

15K  gates 

128  identical  slices 
of  hardware 
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SECTION  IV 


A  (51 , k)  REED-SOLOMON  TRANSFORM  ENCODER  AND  DECODER  TTL  BREADBOARD 


A  TTL  breadboard  that  encodes  and  decodes  a  large  number  of  Reed- 
Soiomon  symbol  error-correction  codes  was  designed  and  fabricated. 

This  breadboard  implements  the  transform  encoding  and  decoding  algor¬ 
ithms  described  in  section  III  of  this  volume.  The  code  of  longest 
block  length  that  can  be  processed  by  the  breadboard  is  a  51-svmbol 
code  with  each  symbol  represented  by  eight  bits.  The  (51  ,k)  Roed- 
Solomon  transform  encoder  and  decoder  breadboard  is  shown  in  Figure  18. 

The  major  difference  between  the  TTL  implementation  and  the 
design  proposed  for  future  VLSI  implementation  is  size.  The  bread¬ 
board  contains  only  eight  polynomial  divider  circuits  which  process 
eight-bit  symbols  and  it  cannot  calculate  all  of  the  n-point  transforms 
that  can  be  processed  by  the  (255, k)  encoder  and  decoder.  The  bread¬ 
board's  transform  section  does  not  contain  the  additional  temporary 
memory  that  allows  pipeline  operation.  The  breadboard's  errata-loca- 
tion  section  is  not  as  large  as  the  (255, k)  decoder’s  errata- locator ; 
consequently  the  breadboard  cannot  correct  as  many  combinations  of 
errors  and  erasures  as  can  be  processed  by  the  (255, k)  decoder.  The 
codes  that  can  be  processed  by  the  TTL  breadboard  are  shown  in  Table 
IX.  Although  the  breadboard  cannot  encode  or  decode  all  of  the  codes 
processed  by  the  (255, k)  encoder  and  decoder,  it  can  accommodate  a 
large  subset  of  them.  It  therefore  serves  as  a  proof-of-concept  veri¬ 
fication  of  the  (255, k)  encoder  and  decoder. 

4 . 1  Transform  Section 

The  (51, k)  encoder  and  decoder's  transform  section  is  contained 
on  the  five  wire-wrap  logic  hoards  shown  in  the  lower  right  corner 
of  Figure  18.  This  transformer  Implements  the  fast  polynomial  evalu¬ 
ation  algorithm  described  in  section  3.2.1.  The  breadboard's  trans- 
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Table  IX 


former  consists  of  a  polynomial  residue  calculator,  a  polynomial  res¬ 
idue  evaluator,  and  an  arithmetic  controller. 

The  residue  evaluator  and  arithmetic  controller  were  designed 
identically  to  those  structures  described  earlier  in  section  3. 2. 1.2 
and  section  3.2.1. 3.  The  breadboard's  residue  calculator  is  a  scaled 
version  of  the  (255, k)  encoder  and  decoder's  polynomial  residue  cal¬ 
culator.  The  polynomials  that  can  be  used  for  division  and  the  trans¬ 
forms  that  can  be  processed  by  the  breadboard  are  shown  in  Table  X. 

This  table  lists  the  lengths  of  the  transforms,  the  fields  in  which 
the  transforms  are  defined,  the  kernels  of  the  transforms,  and  the 
minimal  polynomials  required  for  the  fast  polynomial  evaluation  algor¬ 
ithm. 

4.1.1  Polynomial  Residue  Calculator 

The  breadboard's  polynomial  residue  calculator  consists  of  eight 
divider  circuits.  Each  circuit  consists  of  eight  identical  BFSRs 
that  can  be  reconfigured  for  division  using  the  divisor  polynomials  shown 
in  Table  X.  The  residue  calculator  is  designed  to  operate  with  our 
eight-bit  symbol  representation  (equation  3-9);  for  operation  in  GF(2nl), 
where  m  <  8;  -  m  binary  shift  registers  are  unused.  The  eight  divider 

circuits  were  designed  using  the  hardware  reduction  techniques  de¬ 
scribed  in  section  3.2.  1.1.  The  resulting  implementation  contains 
only  three  programmable  feedback  taps.  The  division  capabilities  of 
the  divider  circuits  are  shown  in  Table  XT. 

To  compensate  for  the  lack  of  memory  required  for  pipeline  oper¬ 
ation.  each  of  the  breadboard's  eight  divider  circuits  operates  in 
two  modes.  During  the  transform's  division  cycle  all  divider  circuits 
are  configured  to  divide  by  the  minimal  polynomials  associated  with 
the  desired  transform.  After  division  is  complete,  each  shift  regis¬ 
ter  is  reconfigured  so  that  its  feedback  taps  are  deactivated  and 
each  shift  register's  output  is  fed  back  to  its  input.  Each  divider 
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Table  X 


Transform  Capabilities  of  the  (51, k)  Breadboard 
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Table  XT 


Programmability  of  the  (51,k) 
Transformer's  Divider  Circuit 
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circuit  becomes  a  recirculating  shift  register  that  contains  the 
calculated  residue  polynomial.  This  residue  can  be  read  out  of  the 
recirculating  memory  to  calculate  a  transform  point;  the  residue  is 
simultaneously  restored  for  further  processing.  The  ability  to  store 
the  residue  polynomials  in  the  calculation  hardware  demonstrates  the 
breadboard's  hardware  efficiency.  However,  the  divider  circuits  can¬ 
not  process  new  information  while  they  are  being  used  as  recirculating 
memories.  The  duty  cycle  of  the  breadboard's  transform  section  is  one- 
half  the  duty  cycle  of  the  (255, k)  transformer,  and  throughput  is  re¬ 
duced  proportionally. 

The  polynomial  residue  calculator  is  implemented  with  four  of 
the  five  wire-wrap  logic  boards  located  in  the  lower  right  corner  of 
the  breadboard's  card  cage  (Figure  18).  The  four  boards  are  identical 
and  each  contains  two  identical  slices  of  hardware.  Each  slice  imple¬ 
ments  the  eight  reconfigurable  BFSRs  shown  in  Table  XI.  A  logic-level 
diagram  of  a  single  slice  of  the  residue  calculator  is  shown  in  Figure 
19.  In  this  figure,  the  four-to-one  multiplexer  associated  with  each 
shift  register  selects  the  register's  output  tap  which  defines  the 
shift  register's  divisor  polynomial.  The  AND-gate  located  at  the 
output  of  this  multiplexer  controls  the  BFSR's  mode  of  operation.  An 
activated  AND-gate  indicates  polynomial  division;  a  deactivated  AND- 
gate  indicates  recirculating  data  in  the  BFSR.  The  final  eight-to- 
one  multiplexer  is  used  to  select  the  residue  polynomials  that  are 
required  to  complete  the  polynomial  evaluation  algorithm. 

4.1.2  Polynomial  Residue  Evaluator 

The  breadboard's  polynomial  residue  evaluator  is  implemented 
using  the  fifth  wire-wrap  logic  board  shown  in  the  lower  right  corner 
of  Figure  18.  The  evaluator  consists  of  an  eight-bit  modulo-two 
accumulator  and  a  programmable  GF(2m)  array  multiplier.  A  detailed 
block  diagram  of  the  evaluator  is  shown  in  Figure  20.  The  evaluator 
implements  a  continued  product  expansion  for  polynomial  residue 
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Divider  Circuits 


evaluation  (equation  3-11).  Design  and  operation  of  polynomial  res¬ 
idue  evaluators  are  identical  in  the  breadboard  and  the  (255, k)  trans- 
f  o  rme  r . 

The.  critical  component  of  the  breadboard's  polynomial  residue 
evaluator  is  the  programmable  GF(2m)  array  multiplier.  (A  description 
of  this  multiplier  is  given  in  appendix  A  of  this  volume).  This 
multiplier  consists  of  a  pairwise  product  array,  an  accumulator  array, 
and  programmable  field-reduction  circuitry  (Figure  20).  The  pairwise 
product  array  operates  with  two  8-bit  inputs  and  forms  64  pairwise 
modulo-2  products.  This  array  is  implemented  as  64  2-input  AND  gates 
(see  Figure  21) .  The  accumulator  array  operates  on  the  64  pairwise 
products  and  forms  15  partial  sums  (see  appendix  A) .  The  accumulator 
is  implemented  as  15  Exclusive-O:-’.  trees  using  2-input  Exclusive-OR 
gates.  The  programmable  field  reduction  circuitry  operates  on  the 
15  partial  sums  and  calculates  the  8-bit  representation  of  the  desired 
product.  This  circuit  is  implemented  as  eight  Exclusive-OR  trees, 
whose  inputs  are  programmed  in  accordance  with  the  field  reduction 
equations  presented  in  append'.-.  A. 

4.1.3  Arithmetic  Controller 

The  breadboard's  aritnmetic  controller  is  located  behind  the 
front  panel  controls  shown  in  Figure  18.  The  controller  consists  of 
a  programmable  up-down  counter,  a  transform  kernel-generating  circuit 
and  preprogrammed  memory  as  shown  in  Figure  6.  The  arithmetic  con¬ 
troller  is  implemented  in  discrete  combinational  logic  and  memory. 

4 . 2  Errata-Location  Section 

The  breadboard's  errata-location  section  is  implemented  using 
the  six  wire-wrap  logic  boards  shown  in  the  lower  left  corner  of 
Figure  18.  The  breadboard's  errata- locator  contains  16  symbol  error- 
correction  slices  while  the  (255, k)  decoder's  errata  locator  contains 
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128  symbol  error-correction  slices.  The  breadboard  can  correct  all 
combinations  of  t  errors  and  s  erasures  provided  the  inequality 


2t  +  s  <  n-k  <  16 


(4-1) 


is  satisfied. 

The  breadboard's  symbol  error-correction  slices  are  implemented 
on  four  identical  wire-wrap  logic  boards.  Each  of  these  boards  con¬ 
tains  four  identical  error-correction  slices,  each  slice  is  equivalent 
to  the  8-bit  slice  shown  in  Figure  17.  The  logic-level  diagram  for 
one  of  the  breadboard's  symbol-error  correction  slices  is  shown  in 
Figure  22. 

The  field-dependent  portion  of  the  errata- location  section  is 
confined  to  the  other  two  wire-wrap  boards  shown  in  the  lower  left 
corner  of  Figure  18.  One  board  implements  field  element-division 
and  contains  a  programmable  GF(2m)  array  multiplier  that  is  identical 
to  the  multiplier  implemented  in  the  polynomial  residue  evaluator. 

The  errata-locator ' s  sixth  wire-wrap  board  contains  the  programmable 
GF(2m)  serial  multipliers  that  are  required  to  implement  the  decoding 
algorithm. 

The  timing  and  control  circuitry  required  to  implement  the  errata- 
location  algorithm  is  located  behind  the  front  panel  shown  in  Figure 
18.  Also  located  behind  this  front  panel  are  interface  and  self¬ 
testing  circuits.  The  timing,  control,  interface  and  self-testing 
circuits  are  implemented  in  discrete  combinational  logic  and  memory. 

4 . 3  Operational  Characteristics 

The  operation  of  the  breadboard  is  similar  to  the  operation  of 
the  (255, k)  encoder  and  decoder  (section  3.3).  The  breadboard  is 
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Figure  22.  Schematic:  8-Bit  Symbol  Correction  Slice 


designed  using  readily  available  Schottky  TTL  logic.  The  majority 
of  the  logic  functions  are  implemented  using  small-scale  integrated 
(SSI)  circuit  technology,  with  a  small  section  of  control  circuitry 
implemented  in  medium  scale  integrated  (MSI)  logic  and  memory. 

The  errata-location  section  implements  the  decoding  algorithm 
using  the  same  definition  of  machine  cycle  as  was  presented  in  Sec¬ 
tion  3.3.  There  are  20  clock  cycles  required  to  implement  a  single 
machine  cycle.  The  time  required  to  implement  GF(2m)  multiplication 
is  the  critical  factor  that  is  used  to  define  a  clock  cycle.  A  pro¬ 
grammable  GK(2m)  array  multiplier,  designed  in  Schottky  TTL  logic, 
can  multiply  two  field  elements  in  60  nanoseconds.  A  clock  cycle  for 
the  breadboard  is  defined  to  be  100  nanoseconds  and  a  breadboard 
machine  cycle  is  defined  to  be  2  microseconds. 

The  transform  section  can  compute  an  n-point  transform  in  4n 
microseconds.  The  first  2n  microseconds  are  required  to  implement 
the  polynomial  division  associated  with  the  fast  polynomial  evaluation 
algorithm.  The  second  2n  microseconds  are  required  to  evaluate  the 
residue ' polynomials .  A  single  point  in  the  transform  is  calculated 
during  every  machine  cycle  associated  with  the  second  2n  microseconds. 

The  errata-location  section  requires  sequential  data  from  the 
transformer.  The  first  transformed  point  is  available  for  processing 
after  n  +  1  machine  cycles.  The  n  machine  cycles  used  to  synthesize 
the  errata-location  polynomial  and  recover  the  k  information  symbols 
are  offset  one  machine  cycle  from  the  n  machine  cycles  that  the  trans¬ 
former  requires  for  evaluation.  The  total  time  required  to  decode  an 
(n,k)  code  is  2n  +  1  machine  cycles,  or  (4n  +  2)  microseconds. 

The  breadboard  can  operate  in  either  a  single-cycle  or  continuous 
mode.  In  the  single-cycle  mode  the  breadboard  operates  on  a  single 
block  of  data  and  requires  4n  microseconds  to  encode  and  4n  +  2  micro¬ 
seconds  to  decode.  In  a  continuous  mode,  the  breadboard  accepts  a 
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new  block  of  data  at  4n  microsecond  intervals.  After  an  initial  delay, 
the  offset  pulse  becomes  transparent  and  a  block  of  decoded  data  is 
available  every  4n  microseconds. 

The  (51 ,k)  Reed-Solomon  transform  encoder  and  decoder  TTL  bread¬ 
board  has  been  interfaced  with  a  semi-automated  testing  facility.  The 
basis  of  this  testing  facility  is  a  dedicated  Hewlett-Packard  2115 
minicomputer.  As  peripherals,  the  minicomputer  has  a  CRT  terminal, 
a  floppy  disk,  a  high-speed  word  generator,  and  a  high-speed  input /- 
output  interface  system.  This  semi-automated  testing  facility  is 
shown  in  Figure  23.  This  facility  was  used  to  debug  and  exercise  the 
(51, k)  encoder  and  decoder  breadboard. 

4 , 4  Hardware  Complexity 

The  breadboard's  transform  section  occupies  the  five  wire-wrap 
cards  shown  in  the  lower  right  section  of  the  breadboard's  card  cage 
(Figure  18).  This  section  consists  of  five  8"  x  8"  wire-wrap  logic 
boards.  Two  different  board  designs  implement  the  transform  section. 
Four  transformer  boards  implement  the  polynomial  divider  circuits, 
as  indicated  in  Table  XI.  These  boards  are  identical,  each  board 
containing  two  slices  of  each  divider  circuit.  The  fifth  board  ir. 
the  transform  section  implements  the  continued  product  expansion  for 
polynomial  evaluation.  This  board  contains  a  programmable  GF(2m) 
array  multiplier  and  accumulator  structure. 

Each  of  the  transform's  polynomial  division  boards  contains  128 
bits  of  shift  register,  and  approximately  750  logic  gates.  The  poly¬ 
nomial  residue  evaluator  board  contains  approximately  900  logic  gates. 
The  breadboard's  transformer  has  a  total  of  512  bits  of  shift  register 
and  4k  logic  gates.  Each  wire-wrap  board  carries  approximately  50  ICs 
so  that  approximately  250  ICs  are  used  in  the  construction  of  the 
transformer.  The  logic  for  controlling  the  transformer  is  contained 
in  the  timing,  control  and  interface  section. 
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The  errata-locator  occupies  the  six  wire-wrap  cards  shown  in 
the  lower  left  section  of  the  card  cage  (Figure  18).  This  section 
consists  of  six  8"  x  8"  wire-wrap  logic  boards  and  approximately  300 
ICs.  All  control  for  the  errata-locator  is  provided  by  the  timing, 
control,  and  interface  section. 

Three  different  logic  board  designs  implement  the  errata- location 
section.  The  field-independent  portion  of  the  errata  locator's  archi¬ 
tecture  consists  of  slices  of  hardware  that  are  shown  in  Figure  22. 
Four  of  these  slices  are  designed  to  fit  on  one  wire  wrap  logic  board. 
Sixteen  slices  are  required  to  implement  the  errata-locator.  Four 
of  the  six  logic  boards  are  designed  and  built  identically.  The  fifth 
board  contains  all  the  field-dependent  logic  associated  with  calcu¬ 
lating  the  present  discrepancy.  This  board  also  contains  the  field- 
dependent  logic  required  to  sequentially  revise  the  present  feedback 
connection  polynomial.  The  sixth  board  contains  the  field-element 
division  circuitry.  The  critical  structure  located  on  the  sixth 
board  is  a  programmable  GF(2m)  array  multiplier  designed  identically 
with  the  multiplier  used  in  the  transformer's  polynomial  residue 
evaluator . 

Each  of  the  symbol-correction-slice  boards  contain  128  bits  of 
shift  register  and  400  logic  gates.  The  field  dependent  serial  multi¬ 
plier  board  contains  approximately  Ik  logic  gates,  and  the  field 
element  division  board  contains  approximately  1.5k  logic  gates.  The 
errata-location  section  contains  a  total  of  512  bits  of  shift  register 
and  approximately  4k  logic  gates. 

The  timing,  control,  and  interface  section  is  located  behind 
the  front  panel.  This  special-purpose  circuitry  is  not  repetitive. 

The  construction  of  this  section  requires  approximately  140  SSI  and 
MSI  circuits.  This  section  provides  all  of  the  timing  and  control 
signals  needed  to  operate  the  transform  section.  Included  in  these 
signals  is  the  information  that  determines  which  residue  is  selected, 
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the  field  element  at  which  the  selected  residue  polynomial  is  to  be 
evaluated,  the  order  of  evaluation,  and  all  necessary  clocking  signals 
required  to  operate  both  the  divisor  circuits  and  the  residue-evaluator 
circuit . 

The  timing,  control  and  interface  also  supplies  the  breadboard's 
errata- 1 ocat ion  section  with  its  timing  and  control  signals.  In 
addition  to  providing  the  signals  required  to  calculate  each  step  in 
the  decoding  algorithm,  the  timing  and  control  section  analyzes  each 
step  in  the  modified  Ber lekamp-Mussev  algorithm  and  dictates  the 
necessary  branching.  The  timing  and  control  section  performs  the 
bookkeeping  and  decision  making  associated  with  the  algorithm. 
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APPENDIX  A 


Ml'LTI PLICATION  IN  GF(2m) :  ALGORITHMS  AND  STRUCTURES 

Decoding  algorithms  for  algebraic  error-correction  codes  require 
arithmetic  operations  that  are  defined  on  the  finite  algebraic  fields 
in  which  the  codes  are  defined.  The  essential  operations  are  finite- 
field  multiplication,  addition,  and  inversion.  The  effective  hard¬ 
ware  implementation  of  an  error-correction  decoder  requires  the 
design  of  circuitry  that  implements  these  operations. 

Important  algebraic  error-correction  codes  are  those  that  are 
defined  over  the  binary  extension  fields  GF(2m).  These  codes  have 
svmbols  represented  as  binary  vectors.  Their  encoding  and  decoding 
algorithms  can  be  interpreted  as  special  purpose  digital  signal  pro¬ 
cessing  algorithms.  When  the  fields  of  operation  are  binary  extension 
fields,  finite-field  addition  is  defined  as  bit-by-bit  modulo-two 
addition  and  it  can  be  implemented  using  Exclusive-OR  circuitry. 

Binary  extension  field  multiplication  has  many  structural  interpreta¬ 
tions,  each  leading  to  a  different  hardware  implementation,  and  the 
"best"  implementation  depends  upon  the  particular  application. 

This  appendix  describes  algorithms  and  structures  that  can  be 
used  to  implement  binary  extension  field  multiplication.  First, 
binary  extension  field  multiplication  is  described.  Then,  an  over¬ 
view  of  different  GF(2m)  multiplier  structures  is  presented.  Finally, 
the  multipliers  that  are  used  in  the  Reed-Solomon  error-correction 
encoder  and  decoder  are  described  in  detail. 

A. 1  MULTIPLICATION  IN  GF(2m) 

A  binary  extension  field,  or  Galois  field,  GF(2m)  is  a  finite 
algebraic  field  that  contains  2m  -  1  nonzero  field  elements.  The 
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field  is  generated  by  an  m-th  degree  irredue.  ' o  nolvnomial  p(x), 
having  a  root  i  which  lies  in  the  extension  field.  The  specific- 
polynomial  p(x)  chosen  for  each  of  the  fields  over  which  the  decoder 
operates  is  a  primitive  polynomial,  meaning  that  the  root  <  is  a 
primitive  field  element,  which  in  turn  means  that  each,  of  the  nonzero 

r\  ^ 

.eld  elements  can  be  represented  as  a  power  of  it  (i.e.,  c \  r, 
...,  t-  -/).  bach  nonzero  field  element  can  also  be  represented  as 
a  binary  m-tuple  which  can  be  considered  a  vector  relative  to  the 
normal  basis  {u°  ,  lir'~  * } .  This  multiplication  of  two  nonzero 

field  elements  can  he  implemented  using  either  representation.  The 
addition  of  two  field  elements  is  conveniently  implemented  as  vector 
addi t  ion. 

Binary  extension  field  addition  can  be  interpreted  as  the  pair- 
wise  modulo-2  addition  of  the  m-tuple  representation  of  the  field 
elements  to  be  added 


J  c  GF ( 2m) 

i 


c  .  ,  •  •  •  t  i,i  j  a 

m-1  1 


0  i,  j  <  2-2 


i  i 
n 


j  j  j  j 

^  1  m-r  •  •  •  ’  V  un 


(A- 1 ) 


+  =  ^m-l^m-l’  •••’  al©al’  a0®a0 


Multiplication  of  two  field  elements  that  are  represented  as 
powers  of  the  primitive  element  has  a  familiar  logarithmic  appearance. 
Binary  extension  field  multiplication  using  this  symbolic  representa¬ 
tion  has  a  compact  form  and  it  is  well  suited  for  implementation 
using  table  look-up  procedures. 
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a1,  aJ  c  GF ( ?m)  0  <  i,  j  <  2m-2 

ai  .  aJ  4  a(i+j)mod(2m-l)  (A-2)' 


Binary  extension  field  multiplication  using  field  elements  re¬ 
represented  as  m-tuples  has  a  definition  that  resembles  convolution 
or  polynomial  multiplication. 
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This  form  of  multiplication  can  be  interpreted  as  a  two-step  pro¬ 
cedure.  First,  two  polynomials  of  degree  at  most  m-1  are  multi¬ 
plied  to  form  a  product  polynomial  of  degree  at  most  2m-2.  Secondly, 
the  product  polynomial  is  reduced,  modulo  p(x),  to  a  polynomial  of 
degree  less  than  or  equal  to  m-1  and  whose  coefficients  are  the 
product  m-tuple.  The  latter  definition  of  binary  extension  field 
multiplication,  (equation  A-3) ,  can  be  expanded  to  indicate  the 
intermediate  operations  that  are  required  for  implementation. 
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There  are  three  steps  used  to  implement  equation  (A-4) .  First, 
the  pairwise  product  of  each  term  within  the  two  m-tuples  to  be 
multiplied  is  formed.  Next,  these  pairwise  products  are  accumulated 
to  form  the  partial  products  that  represent  the  coefficient  of  the 
product  polynomial.  Finally,  the  product  polynomial  is  reduced  modulo 
p(x) . 

A. 2  CF(2m)  MULTIPLIER  STRUCTURES 

Binary  extension  field  multiplier  structures  can  be  partitioned 
into  two  classes.  One  class  is  based  upon  table  look-up  procedures 
and  its  hardware  implementations  are  memory  intensive.  The  other 
class  of  multiplier  structures  is  based  upon  the  algebraic  properties 


110 


of  the  binary  extension  fields  and  the  hardware  implementations  use 
random  logic.  Both  implementation  strategies  have  their  particular 
advantages  and  disadvantages. 

Memory  Intensive  Multiplier  Structures 

The  simplest  form  of  a  GF(2m)  multiplier  implements  a  direct 
table  look-up  procedure.  There  are  many  possible  variations  on  this 
strategy,  but  in  general  the  two  field  elements  to  be  multiplied  are 

used  to  identify  a  particular  memory  location  in  which  the  precalcu¬ 
lated  product  is  stored.  The  different  implementations  using  this 
strategy  depend  on  the  ways  in  which  the  elements  to  be  multiplied 
can  be  combined  to  identify  the  memory  location  and  the  ways  in 
which  the  computed  product  element  can  be  stored. 

Memory  intensive  multiplier  structures  have  common  characteris¬ 
tics.  Since  these  multipliers  are  basically  memory,  the  complexity 
of  the  resulting  hardware  implementation  is  dependent  on  the  selec¬ 
ted  device  technology.  Because  of  the  range  of  available  memory 
technologies,  these  multipliers  can  have  a  wide  range  of  operational 
rates.  A  disadvantage  of  memory-intensive  multipliers  is  that  the 
storage  requirements  increase  exponentially  with  the  degree  of  the 
field  extension.  Available  memory  technologies  can  provide  very  fast  (<100 
nanosecond)  multiplication  times  for  small  fields  (m  £  5),  but 
access  times  increase  rapidly  as  the  fields  become  larger. 

Random-Logic  Multiplier  Structures 

Random-logic  multiplier  structures  separate  into  two  different 
implementation  classes.  The  computational  steps  outlined  in  equation 
(A-4)  can  be  performed  serially  in  time,  and  the  resulting  hardware 
implementation  uses  sequential  logic.  Alternatively,  the  necessary 
calculations  can  be  performed  concurrently  in  time,  and  the  resulting 
implementation  uses  arrays  of  combinational  logic.  These  two 
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approaches  have  a  unique  relationship.  Sequential  multipliers  per¬ 
form  their  computational  steps  in  series,  often  using  the  same  hard¬ 
ware  to  compute  different  steps.  Therefore,  sequential  multipliers 
tend  to  have  simple  structures,  (i.e.,  LFSRs) ,  but  their  multiplica¬ 
tion  times  are  relatively  slow.  The  combinational  logic  array  multi¬ 
pliers  perform  many  operations  simultaneously,  using  different  sections 
of  hardware  to  compute  different  steps.  The  array-type  multipliers 
tend  to  have  complex  hardware  but  their  multiplication  times  are  very 

fast.  fn  general,  there  is  an  inverse  relationship  between  a  multi¬ 
plier  structure's  hardware  complexity  and  its  multiplication  time. 

A  0F(2m)  sequential  multiplier  is  shown  in  Figure  A-l.  This 
circuit  directly  implements  the  computational  steps  of  equation  (A-3). 
The  sequential  multiplier  is  a  simple  structure,  requiring  4m  bits 
of  shift-register  circuitry  and  a  maximum  of  12m  logic  gates  (a  logic 
gate  is  taken  to  mean  either  a  two  input  NAND  or  NOR  gate).  A  com¬ 
plete  multiplication  cycle,  assuming  serial  output,  takes  2m-l  clock 
cycles . 

A  simple  description  will  illustrate  this  circuit's  operation. 
Initially,  the  irreducible  polynomial  associated  with  the  field  of 
operation,  p(x)  =  Pq  +  Pjx  +  •••  +  p^x m,  is  loaded  into  the  P  register. 
The  product  register,  C,  is  cleared  to  zero.  The  m-tuple  representa¬ 
tion  of  the  field  element  is  stored  in  the  multiplier  register.  A, 
while  the  m-tuple  representation  of  the  field  element  a~*  is  stored 
in  the  multiplicand  register,  B.  The  multiplier  register  effectively 
holds  as  the  coefficients  of  an  (m-l)th  degree  polynomial.  The 
first  clock  pulse  latches  , (a  *  +  a. ^x  +  •••  +  a*  x™  into 

the  product  register.  On  the  next  clock  cycle  this  product  is 
shifted  toward  the  right.  This  is  equivalent  to  multiplying  the 
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Figure  A-l.  Sequential  GF(2 
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product  with  x.  Prior  to  this  shift,  the  contents  of  the  C  . 

ister  was  This  corresponds  to  ,  a  x  .  The 

m- 1  .  m- 1  m- 1  m- 1 

shift  creates  c*-*  ,  a3  ,xm,  and  the  feedback  connection  polynomial, 

m- 1  m- 1 

p (x) ,  performs  division  or  polynomial  modulo  reduction  on  this  over¬ 
flow  term.  At  the  conclusion  of  the  second  clock  cycle,  the  product 
register  contains 


a'*  .  (at  +  at^x  +  •  •  •  +  a1  x™  1)x  mod  p(x)  + 

m- 1  0  1  m- 1 


(A- 5) 
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This  process  continues  for  m-1  clock  cycles.  At  the  completion  of 
tie  (m-l)th  clocking  cycle,  the  contents  of  the  product  register  is: 

.  ,  ,  i  ,  i  ,  1  i  m— 1 ,  ,  .  , 

(•••  ((or  ,  (art  +  a,x  +  ••’  +  a  .x  )x  mod  p(x) 
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This  expression  can  be  reduced  to: 


At  the  conclusion  of  the  (m-l)th  clocking  cycle,  the  switch  Sj  is 
grounded  and  the  product  is  shifted  out  of  the  product  register  on 
the  following  m  clock  cycles.  There  are  2m-l  clock  cycles  required 
for  complete  multiplication. 

The  structure  of  a  GF(2m)  array  multipler  is  shown  in  Figure  A-2. 
it  implements  the  computational  steps  outlined  in  equation  (A— 4).  The 
array  multiplier  consists  of  three  functionally  separate  hardware 
sections.  The  first  section  calculates  the  pairwise  products 
oetween  the  m-tuples  associated  with  the  field  elements  to  be  multi¬ 
plied.  A  second  section  operates  on  these  products  and  accumulates 
the  2m-l  terms  that  are  the  coefficients  of  the  product  polynomial 
indicated  in  equation  (A-4).  The  third  section  operates  on  the 
2m-l  accumulated  product  terms  and  implements  modulo  p(x)  reduction, 
resulting  in  the  final  product. 

The  array  multiplier  can  multiply  two  field  elements  quickly. 

Fast  multiplication  rates  are  obtained  because  the  added  circuitry 
is  included  to  perform  the  requisite  operations  in  parallel  and  be¬ 
cause  the  modulo  p(x)  computation  is  implemented  asynchronously  as 
an  end-around-carry  operation.  The  end-around-carry  reduction  is 
implemented  by  simply  feeding  back  the  overflow  bits  from  the 
product  equation  in  a  predetermined  manner. 
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The  hardware  complexity  of  the  array  multiplier  shown  in  Figure 

A-2  can  be  measured  in  terms  of  logic  gates.  The  pairwise  product 
2  2 

array  requires  m  gates  to  calculate  the  m  partial  products.  The 
accumulator  array  is  configured  as  2m-l  Exclusive-OR  trees.  The  max¬ 
imum  number  of  logic  gates  required  to  implement  this  section  in  tn(m-l) 

gates.  The  field  reduction  circuitry  can  be  implemented  as  m  Exclusive- 

2 

OR  trees,  with  the  maximum  number  of  logic  gates  equal  to  m*'.  The 

total  number  of  gates  required  to  implement  the  GF(2m)  array  multi- 

2 

plier  is  on  the  order  of  4m  gates. 

The  total  time  required  to  multiply  two  field  elements  is  depen¬ 
dent  upon  the  propagation  delay,  t,  through  a  logic  gate.  The  time 

2 

required  to  calculate  the  m  pairwise  products  is  2i.  The  delay 
through  either  the  accumulator  array  or  the  field  reduction  circuitry 
is  3iflog2(m)],  where  Txl  is  the  smallest  integer  larger  than  x.  The 
maximum  time  required  to  multiply  two  field  elements  in  GF(2m)  is 
2t  +  6t  [Log„(m)] .  Assuming  a  3  nanosecond  propagation  delay,  two 

^  g 

field  elements  from  GF(2  )  can  be  multiplied  in  60  nanoseconds. 

A. 3  REED-S0L0M0N  ENCODER  AND  DECODER  MULTIPLIER  STRUCTURES 

The  design  for  the  Reed-Solomon  (255, k)  transform  encoder  and 
decoder  has  five  separate  requirements  for  binary  extension  field 
multiplication.  Each  application  requires  multiplication  in  GF(2m) 
wheT-e  m  =  4, 5, 6, 7,  or  8.  The  design  of  a  multiplier  that  can  be 
pi  >grammed  to  operate  in  more  than  one  extension  field  is  difficult 
because  the  multiplier  structure  that  is  designed  to  operate  in  GF(2m) 
is  not  directly  expandable  for  operation  in  GF(2m+^).  The  requirement 
to  operate  in  five  different  fields  eliminates  the  memory  intensive 
multipliers  as  candidate  multiplier  structures  since  need  for  recon¬ 
figurability  produces  multiplication  rates  that  are  too  slow. 
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Throe  field-dependent  multiplier  structures  are  used  in  the  en¬ 
coder  and  decoder;  two  of  the  designs  are  repeated  twice.  The  design 
of  each  multiplier  is  based  on  the  multiplication  algorithm  shown  in 
equation  (A- 3).  The  primitive  polynomials  that  generate  each  field 
are  shown  in  Table  A-I.  The  remainder  of  this  appendix  will  describe 
the  design  and  operation  of  the  three  field-dependent  multipliers 
used  in  the  encoder  and  decoder. 

i:i'(  j"1!  Programmable  Array  Multiplier 

The  most  complicated  multiplier  structure  used  in  the  encoder 
and  decoder  is  a  progranuunb 1 e  CF(2m)  array  multiplier  similar  to  the 
one  previously  described.  (A  block  diagram  of  the  programmable  array 
multiplier  is  shown  in  Figure  A-3) .  The  programmable  array  multiplier 
consists  of  a  pairwise  product  array,  an  accumulation  array  and  field 
reduction  circuitry.  The  field  reduction  circuitry  is  reconf igurah I e 
to  provide  modular  polynomial  reduction  using  any  of  the  primitive 
polynomials  shown  in  Table  A-I. 

The  programmable  array  multiplier  is  designed  to  operate  with 
our  standard  8-hit  symbols  so  that  any  symbol  from  C.F(2m),  with  m  8, 
has  zeros  padded  in  the  most  significant  bit  positions.  Too  program¬ 
mable  array  multiplier's  pairwise  product  array  operates  with  two 
8-bit  symbols  and  calculates  64  pairwise  products.  The  padded  zeros 
in  the  standard  8-bit  symbols  produce  the  correct  zero  products  for 
operation  in  fields  with  m  <  8.  The  pairwise  product  array  contains 
no  additional  hardware  than  would  be  required  for  normal  operation  in 
C.F(28)  . 

The  accumulator  array  operates  on  the  64  pairwise  products  from 
the  product  array  to  calculate  15  partial  sums.  These  terms  are  the 
coefficients  of  the  product  polynomial  shown  in  equation  (A-4).  The 
15  coefficients  are  formed  using  15  F.xclusive-OR  trees.  When  multi¬ 
plication  is  required  in  fields  where  m  <  8,  the  padded  zeros  that 


Table  A- I 


Primitive  Polynomials  Used  to  Design  the 
Programmable  GF(2m)  Multiplier  Structures 
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produce  the  correct  zero-valued  pairwise  products  result  in  the  cor¬ 
rect  summations  within  the  accumulator  array.  Operation  in  the  five 
different  finite  fields  is  obtained  without  an  increase  in  the  hard- 

g 

ware  that  is  required  for  operation  in  GF(2  ). 


The  field  reduction  circuitry  is  the  only  field-dependent  section 
of  the  programmable  array  multiplier.  This  structure  uses  the  15 
partial  sums  that  have  been  calculated  in  the  accumulator  and  imple¬ 
ments  polynomial  reduction  modulo  p(x).  The  result  is  a  standard 
8-bit  symbol  that  is  the  correct  product.  The  field-reduction  cir¬ 
cuit  implements  asynchronous  end-around-carrv  to  compute  modular 
polynomial  reduction.  The  reduction  is  different  for  each  field  of 
operation  because  a  different  primitive  polynomial  p(xl  is  used. 
Programmability  is  provided  by  designing  different  feedback  paths  to 
be  selected  for  the  different  fields.  A  block  diagram  of  the  pro¬ 
grammable  field  reduction  circuit  is  shown  in  Figure  A-4.  Here 


(P’  P'^,  ....  P ’ 7 >  are  the  eight 

representation  of  the  product,  and 


bits  in  our  standard  symbol 
’P0’  Pr  ?14  ’  are  the  c0_ 


efficient;’  of  the  product  polynomial  formed  in  the  accumulator  array. 


The  signals  {p4,  F5,  F6,  F?,  Fg}  represent  control  flags  that  indi¬ 
cate  the  field  of  operation.  For  multiplication  in  GF(2  ),  the 


signal  F  is  a  logic  "1";  all  other  F.'s,  where  iM,  are  set  to 
°  m  i 

logic  "0".  These  flags  reconfigure  the  polynomial-reduction  cir¬ 


cuitry  to  the  correct  feedback  paths.  The  logic  functions  imple¬ 


mented  by  the  field  reduction  circuit  are  shown  in  Figure  A-4. 


The  field-reduction  array  consists  of  8  Exclusive-OR  trees. 

Each  tree  has  inputs  that  are  gated  with  the  field-select  control  sig¬ 
nals  F  .  For  operation  in  GF(2m),  each  tree  accumulates  only  the 
m 

terms  in  the  logic  equations  shown  in  Figure  A-4  that  are  associated 

with  the  field-select  signal  F  . 

m 
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<P7+PU+P12+Pn>P8 


Figure  A-4. 


Programmable  GF(2m)  Array  Multiplier  Field  Reduction 
Circuit 


The  GF(2m)  programmable  array  multiplier  is  the  critical  compo¬ 
nent  within  the  transformer's  polynomial  residue  evaluator.  The 
array  multiplier  was  selected  for  this  application  because  of  its  fast 
multiplication  rates.  The  multiplier  is  also  attractive  because  of 
its  repetitive  architecture  and  the  low  complexity  of  hardware  re¬ 
quired  for  reconfigurability. 

The  programmable  array  multiplier  is  also  used  in  the  implementa¬ 
tion  of  the  Ber lekamp-Massey  shift  register  synthesis  algorithm.  The 
array  multiplier  was  chosen  for  this  application  because  of  its  short 
processing  time.  The  array  multiplier  calculates  the  product  of  the 

present  discrepancy  and  the  inverse  of  the  past  discrepancy 
(N)  -(N-l) 

(d  '  b  ).  This  product  is  calculated  once  during  each  iteration 

of  the  algorithm  and  the  time  required  to  form  this  product  limits 
the  operational  speed  of  the  entire  errata-location  section. 

GF(2m)  structure 

The  second  field-dependent  multiplier  structure  used  in  the 
implementation  of  the  encoder  and  decoder  is  a  special-purpose,  field- 
element  squaring  circuitry.  This  structure  calculates  the  squared 
product  of  any  field  element  in  GF(2Tn),  where  m  =  4, 5, 6, 7,  or  8.  The 
2 

a  multiplier  uses  our  standard  symbol  representation  and  implements 


i  i  2i  mod(2m-l) 
a  -  a  =  a 


2i 

a 


£a*xk)x^  mod  p(x) 
j=0  J  k=0 


(A-8) 


where  a1  is  an  element  from  GF(2m)  and  p(x)  is  the  associated  primi¬ 
tive  polynomial  shown  in  Table  A-I .  In  equation"  (A-8).  the  c.voss- 
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i  i 


products  a.  u,  ,  such  that  k^j ,  are  zero  modulo-2.  The  products 
^  ^  J  K  t 

a.  ,  such  that  k=j  ,  are  a, .  Only  the  even  ordered  coefficients  of 
the  product  polynomial  are  formed 


i  i  ,i  i  2  i  4  i  6  i  8 
<  •  a  =  (aQ  +  a1  x  +  a2  x  +  ^  x  +  x 


i  10  ^  i  12  ^  i  14.  J  N 
+  x  +  x  +  o-j  x  )  mod  p (x) 


(A-  * ) 


The  programmable  field  reduction  circuitry  that  implements 

equation  (A-9)  is  similar  to  the  circuitry  used  in  the  programmable 

array  multiplier.  However,  the  field  reduction  associated  with  the 
2 

<x  multiplier  is  simpler  because  there  are  no  odd  ordered  coefficients 

in  the  product  polynomial.  The  i-th  bit  in  the  field  element  to  be 

squared  is  the  2i-th  bit  in  the  product  polynomial  making  the  square 

product  formation  implicit.  The  logic  equations  that  implement  the 

2 

a  multiplier  are  shown  in  Figure  A-5.  In  this  figure,  the  input 
variables  {Pg,  P^,  ••.,  P^}  represent  the  eight  bits  in  our  standard 
symbol  representation  of  the  field  element  and  the  variables 
{Pq' ,  Pj'i  P-,'1,  represent  the  squared  element.  Again,  the 

variables  {F. ,  Fc,  F, ,  F7,  F„}  are  signals  that  represent  the  desired 
4  o  n  /  o 

field  of  operation. 

Field  F.lement  Inversion  Circuit 

m  2 

The  CF(2rn)  programmable  array  multiplier  and  the  CF(2  )  » ^  multi¬ 

plier  can  be  combined  to  implement  a  division-by-inversion  algorithm. 
Once  during  each  iteration  of  the  Berlekamp-Massey  algorithm,  the 
product  d^^b  ^  ^  is  formed,  where  both  d^^  and  b  ^  ^  are  elements 
from  GF(2m).  The  term  is  the  multiplicative  inverse  of  the 
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Figure  A-5. 


Programmable  a 


Multiplier 


The  operation  of  the  field-element  inversion  circuit  can  be  de¬ 
scribed  with  the  aid  of  an  example.  On  the  first  clock  cycle,  switch 

S.  and  are  closed  and  all  other  switches  are  open.  The  previous 

^  (N~l )  2 

discrepancy  a  (=  b  )  is  fed  through  the  a  multiplier  and  the 

result  is  latched  into  the  X  information  register.  The  element 
a1  is  simultaneously  fed  into  the  Y  formation  register.  The  GF(2  ) 
array  multiplier  asynchronously  calculates  the  product  of  the  Y  in¬ 
formation  register  (a1)  and  the  X  information  register  (a  X).  The 
result  is  latched  into  the  P  register.  At  the  conclusion  of  the 
first  clock  cycle,  the  P  information  register  contains  a  \  During 
the  second  clock  cycle  switches  and  are  closed  and  all  other 
switches  are  open.  The  contents  of  the  P  register  are  fed  directly 
into  the  Y  register,  while  the  contents  of  the  X  register 
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rn  «- 

are  fed  back  into  the  GF(2  )  oT  multiplier,  and  the  squared  product 

is  stored  in  the  X  register.  The  contents  of  the  X  and  V  registers  arc 

then  multiplied,  and  the  product,  a  is  stored  in  the  P  register. 

This  operation  of  the  structure  is  repeated  until  m-2  clock  cycles 

have  been  completed.  At  this  time,  the  contents  of  the  P  register 
—  i  /  2 

is  t  .  On  the  next  clock  cycle,  switches  and  S ^  are  closed 
and  all  other  switches  are  open.  The  contents  of  the  P  register  are 

IT)  2. 

fed  to  the  GF(2  1  a"  multipliers.  The  results  are  latched  into  the 
X  register,  while  the  present  discrepancy  rt  (=  d  ) ,  is  fed  into  the 
Y  register.  The  contents  of  the  X  and  Y  registers  are  then  multiplied, 
and  the  prnluet,  F1  *,  is  stored  in  the  P  register. 

The  operation  of  the  field  element  inversion  circuit  requires 
m-1  clock  cycles.  The  information  shown  in  Table  A-II  indicates  the 
status  of  each  switch  and  the  contents  of  each  information  register 
as  a  function  of  clock  cycles.  Table  A-II  represents  the  operation 
of  the  GF(2m)  division  circuit  when  m=8.  Operation  for  m  •  8  is 
accomplished  by  using  fewer  cycles  to  square  the  contents  of  the  X 
register . 

GF(2m)  Serial  Multipli er 

The  final  field-dependent  multiplier  structure  used  in  the  en¬ 
coder  and  decoder  is  one  that  is  used  for  sequential  multiplication. 
This  multiplier  is  used  in  the  calculation  of  the  present  discrepance, 
(N) 

d  .  The  same  multiplier  design  is  used  in  the  present  feedback 
polynomial  calculator. 

The  description  of  the  0F(2m)  serial  multiplier  is  facilitated 
by  reexamining  the  definition  of  GF(2m)  multiplication  as  defined  in 
equation  (A-3) .  This  definition  of  multiplication  was  shown  to  he 
equivalent  to  the  mul tiplication  of  two  (m-l)th  degree  polynomials 
followed  by  the  polynomial  reduction  of  the  resulting  2(m-l)th  degree 
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Field  Clement  Division  in  GF(2 
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product  polynomial.  The  GF(2m)  programmable  array  multiplier  imple¬ 
ments  this  multiplication  with  all  products  calculated  simultaneously. 
The  serial  multiplier  operates  sequentially  to  implement  equation  (A-l) 
in  the  form 


u 


m-1  . 

53  (a'1  )x£  mod  p(x) 

?=0  1" 


(A-m 


This  equation  ic  implemented  using  the  structure  shown  in  Figure  A- 7, 
which  uses  m  clock  cycles  to  complete  the  calculation.  Prior  to  the 
first  clock  cycle,  all  latches  have  been  cleared  to  zero,  switch 
has  been  closed  and  switch  opened.  The  first  clock  cycle  corres¬ 
ponds  to  i=0  in  equation  (A-ll) .  During  this  cycle  o'*  is  fed  through 
switches,  and  latched  in  the  information  register  X.  The  contents 
of  X  are  multiplied  by  a*  and  latched  in  register  P.  During  the 
second  machine  cycle,  C=l,  switch  is  opened  and  switch  is 
closed.  The  contents  of  the  X  register  (o^)  are  fed  to  the  serial 
multiplier,  and  the  output  of  the  multiplier  (a^x  mod  p(x))  is 
latched  into  the  X  register.  The  contents  of  the  X  register  are  then 
multiplied  with  a*  and  accumulated  with  the  contents  of  the  P 
register.  After  two  full  clock  cycles  the  content  of  the  P  register 
is  x  mod  p(x).  The  operation  continues  for  m  complete 

clock  cycles,  after  which  the  P  register  contains  the  product  «  c' . 

The  critical  component  of  the  multiplier  structure  shown  in 
Figure  A-7  is  the  GF(2  )  serial  multiplier.  This  multiplier  takes 
any  field  element  and  forms  the  field  element  oj-’x  mod  p(x)  which 
is  equivalent  to  a"'+^ .  A  diagram  of  the  programmable  GF(2m)  serial 
multiplier  is  shown  in  Figure  A-8.  In  this  figure,  the  variables 
(P0,  •••*  P y }  represent  our  standard  symbol  representation  of  the 
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Sequential  CF(2m)  Multiplication  Using 
Programmable  Serial  Multiplier 
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Figure  A-8.  Programmable  GF(2m)  Serial  Multiplier 


input  to  the  serial  multiplier,  and  the  symbols  {P'q,  P'^,  •••>  ^ ' 7 1 
represent  the  output  of  the  multiplier.  As  before,  the  symbols 
are  flags  that  are  used  to  indicate  the  field  of  operation.  For  a 
given  input  a"',  the  serial  multiplier  calculates 


cJ+1  =  (.0Lq  +  a^x  +  • 


+  a^x^)x  mod  p(x) 


( A- 1 2 ) 


where  a''  and  are  elements  from  GF(2m).  Equation  (A-12)  can  be 

interpreted  as  shifting  the  8-bit  representation  of  the  field  element 

a’'  and  performing  modulo  p(x)  field  reduction.  The  field  reduction 

is  implemented  in  an  end-around  carry  technique  that  is  identical  to 

2 

that  used  in  both  the  programmable  array  multiplier  and  the  a  multi¬ 
plier.  However,  the  modular  reduction  is  trivial  in  the  serial  multi¬ 
plier  because  the  product  polynomial  that  is  to  be  reduced  is  only 
of  degree  m.  Since  p(x)  is  also  of  degree  m,  only  one  coefficient 
lias  to  be  fed  back  for  each  field  of  operation.  The  simplicity  of 
this  circuit  is  indicated  by  the  logic  equations  shown  in  Figure  A-8. 

The  GF(2m)  serial  multiplier  operates  with  sequential  data. 

Many  partial  products  can  be  "summed"  in  a  binary  fashion,  and  the 
GF(2m)  serial  multiplier  can  be  used  to  implement  associative  multi- 
pl  ication 

i,0  1  2  ,  N.  /-An\ 

a  (a  +  a  +  a  +  •••  +  a  )  (A-13) 

This  type  of  multiplication  is  implemented  by  forming  the  binary  sums 
(modulo-two)  of  the  terms  within  the  parenthesis  and  then  using  the 
serial  multiplier  to  complete  the  multiplication.  In  this  manner, 

N+l  GF(2m)  multipliers  can  be  repl  iced  by  one  GF(2m)  serial  multiplier, 
in  the  formation  of  convolution  products. 
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APPENDIX  B 


AN  EXAMPLE:  A  (31,15)  REED- SOLOMON 
CODE  CONSTRl'CTED  OVER  OF  (2  ’) 


In  this  appendix,  the  decoding  algorithm  is  illustrated  hv  means 
of  an  example  of  decoding  with  the  (31,15)  Reed-Solomon  code, 
message  sequence  of  length  k  =  1 5,  over  C!F(2  )  is  given.  Encoding 
is  performed  by  forming  a  sequence  of  length-31  with  zeros  in  t lie- 
first  16  positions  and  the  information  symbols  in  the  remaining 
positions,  and  then  applying  the  inverse  transformation  to  yield  a 
length-31  codeword,  also  over  CF(23).  Field  elements  are  represented 
by  powers  of  a  primitive  element  a. 

The  representation  of  (:F(23)  as  binarv  polynomials  modulo  tin. 


irreducible  polynomial 

X5  +  X2  +  1 

is: 

0 

= 

00000 

a7 

=  10100 

„15 

= 

11111 

•V  ~ 

= 

OIL  11 

a3 
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00001 

=  01 101 

«16 
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11011 

l24 
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11110 

«> 

= 
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a' 5 

=  1 1010 

^7 

= 

1001 1 

?  5 

t 

= 

1  !  00 1 

or 

= 

00100 

=  10001 

u18 
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00011 

7  p 
l  *■ 

= 

1011  1 

= 

01000 

=  00111 

a19 
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00110 

a" 
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01011 

a4 

= 

10000 

nI2 

=  OHIO 

a2  0 

= 

01100 

o  p 

= 

10110 

a5 

= 

00101 

a13 

=  11100 

a21 

= 

11000 

2  9 
a 

= 

01001 

n6 

= 

01010 

a14 

=  11101 

a22 

= 

10101 

a3  c 

= 

10010 

Addition  is  defined  as  component-by-component  addition  modulo  2, 
and  multiplication  by  addition  of  exponents  modulo  31. 

Let  the  message  be  the  arbitrarily  chosen  sequence  {M.}, 
i  =  0,  1 ,  ...,  14,  where: 


M„ 

= 

a4 

M 

= 

a7 

C 

B 

M, 

_ 

a12 

M 

— 

a15 

1 

9 

M„ 

_ 

a28 

M  „ 

_ 

a24 

1  0 

M, 

_ 

a29 

M 

— 

a* 

3 

1 1 

M, 

_ 

oc17 

M 

= 

a6 

4 

12 

M 

— 

M,  0 

- 

aJ  4 

5 

1  3 

M. 

_ 

a3j 

M 

= 

ct2  3 

6 

14 

*7 

= 

alO 

The  padded  message  sequence  {A  }  is  then 


A0  =  0 


A  =0 
1 


1  5 

A,  =  M 
lb  0 

A  =  M 
17  1 


=  a12 


A  =  M 
30  14 


=  a’-0 


To  compute  {a^},  the  inverse  transform  of  the  {A^},  consider 


A(x)  =  Aq  +  Aj  x  + 
nomial.  Then 


+  A30  x30,  the  corresponding  30th  degree  poly- 


=  A(a  ) , 


i  =  0,  1 ,  . . . ,  30 
-i. 


To  evaluate  A  (a  ),  evaluate  t^a  )  where  t_i(x)  is  the  residue 
polynomial  which  corresponds  to  division  of  A(x)  by  m_^(x),  the  mini- 
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mum  polynomial  of  a 


That  is 


=  A(a  *)  =  t_ . (a  * ) ,  i  =  0,  1,  ....  30  . 


The  seven  minimum  polynomials  of  the  elements  of  GF(2  )  over 
GF(2),  and  the  corresponding  t^(x)  are  listed  in  Table  B-l. 


To  illustrate  this  procedure. 


a!5  =  A(ri_]?)  =  A»H') 


=  t 

16 

(djp)  =  t} 

(a16) 

(since  m^(x) 

= 

(x)) 

=  a22 

+  a^(a'S) 

+ 

"t 2  8  (a 3  8  )  2  +  u 3  2  (a 3 ' 

)3  + 

a  2  0  ( ■:< 1 

=  a22 

+  a9  +  a2*3 

+ 

a28 

=  a1  ^ 

of  the  inverse  transform 

is  determined 

similarly. 

au 

=  a22 

an 

_  a  1  2 

a2l 

=  a28 

a 

1 

=  a8 

a 

1? 

=  all 

a2  2 

=  a27 

a 

2 

=  a26 

ai3 

=  a22 

J23 

= 

a3 

=  a38 

ai4 

=  a° 

a24 

=  a9 

a 

4 

=  a18 

a!5 

=  a!1 

a 

25 

=  a  1  9 

a 

c 

=  a22 

a 

16 

=  a26 

a 

26 

=  a4 

a 

b 

=  a21 

a!7 

=  al 

a 

27 

=  a24 

a 

7 

=  0 

ai8 

=  a20 

a28 

=  all 

a8 

=  a18 

a!9 

=  a28 

a29 

=  0 

a9 

=  a28 

a 

20 

=  a  1 7 

a30 

=  a8 

aio 

=  a30 
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Table  B-I 


polynomial  corresponding  to  division  of  A(x)  bv 


r 


This  is  the  transmitted  message. 

Suppose  the  following  t  =  5  errors  and  r  =  6  erasures  are  intro¬ 
duced,  such  that 


Errors  Erasure  Locations  (known) 


a  : 
l 

— ► 

a14 

a6  = 

a8 

a , ,  : 

a3  — ► 

a  3  8 

a_  : 

a7 

14 

7 

a  ,  „ : 

.t 2  0 — ■*. 

a22 

a  „  : 

a8 

18 

8 

H.  3 " 

a3  _* 

a31 

a9  : 

Q 

a " 

a  : 

0  — * 

a9 

a ,  „ : 

a 3  8 

29 

10 

a  : 

a  1 1 

1 1 

i  R  i  k 

where  a^:  or — ►a  means  that  a ^  is  changed  from  a  to  a  .  The 

erasures  are  consecutive  to  simulate  a  burst.  The  received  sequence, 
{ri>*  13 


r , 
r 


10 


1 1 


=  a22 

=  a14  (error) 


=  a26 
=  a3j 

=  a18 
=  a22 

=  a2  1 

=  0 
=  a36l 
=  a23< 
=  a30' 
=  a12- 


r  =  a 1 1 
12 

ri3  =  a22 

r,,  =  a18  (error) 
14 


r  =  a 
1  5 


1 1 


,26 


16 

17 


=  a 3 


( erasures) 


r  =  a22  (error) 

18 

r  =  a26 

19 

r  =  a17 

20 


r  =  a28 
21 

r  =  a27 
22 

r  =  a33  (error) 
23 


24 


=  a9 


,19 


25 

'26 


=  a4 


r  =  a24 

27 

=  a31 

28 


29 
'  30 


a9  (error) 

a6 
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r 


where  r  ,  r?,  r^  are  received  correctly  in  this  case  but  with 

enough  uncertainty  to  warrant  labeling  them  erasures.  Altering  them 
changes  neither  the  algorithm  nor  the  decoded  message. 

Decoding  now  begins.  Since  the  bound  2t  +  v  _<  n-k  is  satisfied, 
the  message  is  recovered  completely.  A  flowchart  of  the  algorithm 
is  shown  in  Figure  B-l.  There  , 


v  =  6  =  number  of  erasures 
{r  } 

i  =  0,  1,  ....  31 

2  =  erasure  locations  with 

ip  =  11,  i^  =  10,  . . .  and  i^  =  16. 

The  algorithm  first  generates  the  erasure  polynomial  A^v  ^(x) 

=  (x) .  The  dummy  variable,  v',  counts  the  number  of  erasures  by 

decreasing  from  5  to  0,  after  which  time  the  generation  of  the  errata- 
locator  polynomial  begins  and  a  different  path  is  followed  in  the 
flowchart . 

Table  B-II  displays  the  result  of  all  iterations  up  to  N  =  n-k-1 
=  15  after  which  the  errata  locator  polynomial,  is  synthe¬ 

sized.  Each  line  represents  one  iteration  and  is  filled  out  from 
left  to  right. 

The  first  step  in  decoding  is  to  compute  R^,  the  Nth  term  of 
{R^},  the  forward  transform  of  the  received  sequence  {r^}. 

Here , 


Ra  =  r(al)  =  ui(ai),  i  =  0,  1,  ....  30 
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Figure  B-l.  Flowchart  of  Transform  Decoding  Algorithm 


Result  of  First  n-k  =  16  Iterations 


where  r(x)  =  rQ  +  r  x  +  x3D  is  the  30th  degree  polynomial 

which  corresponds  to  {r^},  and  u^(x)  is  the  remainder  polynomial 

which  results  from  division  of  r(x)  by  m^(x).  (The  remainders,  u.(x), 

are  also  listed  in  Table  B-I).  The  forward  transform,  {R.},  of  the 

1 

received  sequence  is 


R 

0 

= 

a28 

>3 

00 

= 

a24 

R16 

= 

ct7 

R24 

= 

aio 

R 

1 

= 

a23 

R9 

= 

0.1° 

R17 

= 

R2  5 

= 

a1 

R2 

= 

a5 

RI0 

= 

a14 

R18 

= 

a2  7 

R2  6 

= 

a29 

R  3 

= 

a25 

R1 1 

= 

a13 

R 1 9 

= 

a10 

R2  7 

= 

a2  6 

R4 

= 

a23 

R 

12 

= 

a** 

R 

20 

= 

a'- 1 

R 

28 

= 

ai° 

R 

5 

= 

a26 

R13 

= 

a6 

R 

21 

= 

a8 

R29 

= 

a28 

R6 

= 

a 1 

V 

= 

0 

R22 

= 

a1 

R 

30 

= 

a10 

R 

7 

= 

a29 

R15 

= 

a 14 

R2  3 

= 

a7 

Since  GF(2'*)  is  of  characteristic  2,  addition  and  subtraction 
are  both  given  by  mod  2  addition  of  the  5-tuples.  The  first  few 
computations  are: 


A(0)(x)  =  A(_1)(x)  - 


l(0) 

(-D 


x  6(_1)(x) 


=  1  -  [y~]  x  (l)=  1  -  a6  x  =  1  +  aE 


A(1)(x)  =  A(0)(x)  - 


rda> 

u(0) 


x  8(0)(x) 


=  (1  +  a6  x)  -  x  (1  +  a6  x) 
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=  1  +  (a8  4-  a7)  x  +  a*3  x2 


When  N 


Since 


=  1  4-  a24  x  4-  a3  3  x2 


A(5)(x)  =  A4  (x)  -  x  e4  (x) 

=  1  +  (a21  4  a11)  x  +  (a2  +  a1)  x2  +  (a10  +  a13)  x3 

4-  (a14  +  a21)  x4  4-  (a3  4-  a25)  x5  +  a20  x8 

=  1  4-  a15  x  4-  a19  x2  4-  a8  x3  4-  a5  x4  +  a18  x5  4-  a20  x8. 

=  6,  v'  =  0,  and  the  algorithm  branches  with  the  computation 


T (5)+h 

d<6)  ,S.+  V  *<?>  S  . 

6  i.l  1  6-! 


=  „1 


a1  4-  a^5  •  a28  4-  a*9  •  a29  4-  a8 


a2  5  +  cx9  •  a5 


+  a^3  4-  a20  .  a28 

=  a1  4-  a18  4-  a17  4-  a2  +  a10  +  a18  4-  a27 

=  n.26 


i  0,  the  "NO"  path  is  followed  and 
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A(6)(x) 


A(5)(x) 


v26 


L  i  J 


x  e(5)(x) 


=  1  +  (a35  +  a26)  x  +  (a19  +  a10)  x2  +  (a8  +  t!U) 


-*■  (a5  +  a3)  x4  +  (alB  +  au)  x8 


=  1  +  a3  x  +  a28  a2  +  a4  x3  +  a8  x4  +  a3  x3 
+  a4  x6  +  a1 8  x2. 

Now  =  0  j<  =  0,  so  the  "NO"  path  is  followed.  This  process 

1  (15) 

continues  until  errata  locator  polynomial  A  (x)  is  computed.  The 
iteration  at  N  =  15  is: 


»<15>W  -  «<14»(x) 


,15 


a 1  8  J 


x  6(14)(x) 


=  A(1A)  (x)  -  a1  x  B(U)  (x) 


=  1  +  (a5  +  0)  x  +  (a16  +  a1)  x2  +  (a23  +  a28)  x3 

+  (a8  +  a24)  x4  +  (a2  +  a33)  x8  +  (a9  +  a28) 
x8  +  (0  +  a9)  x7  +  (a8  +  a28)  x8  +  (a35  +  a24) 


X9 

+  (a30 

+  a5)  x1 0  + 

(a22 

+ 

al 6)  X1 1 

=  1  +  a8  x 

+  a2  8 

x2  +  a  2  3  x3 

+  a7 

X4 

+  a1 8  x5  + 

a8 

+  x6  + 

a9  x7  +  a1 3 

xB  + 

a° 

xy  +  a2b  x 

+  a*  2  x3 3 
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At  this  point  N  is  incremented  by  1  to  N  =  16  _>  n-k  =  16,  so 

that  the  algorithm  branches  to  generate  the  error  sequence 

(Eq,  Ej,  Ej4)  =  (S16,  S1?,  ....  S3Q),  and  then  the  decoded 

message  (M.)  =  {M.  =  R, ,  +  E, ,  To  illustrate: 

l  i  16+i  16+i 


l  =  N  +  k 
11 


’16  -  L  A  i 

i=l 


n  =  16  +  15  -  31  =  0 
(15)  0 


16-i 


=  a5  ■ 

,  a14 

+  a25 

•  0  +  a2  *  • 

a 6  + 

a7  •  a4  +  aly  •  ct^  +  a® 

a14 

+  a9  • 

a1 0  +  a1 3 

24 

+  a0  •  •  a*  +  a 

=  a*9  +  a2/  +  +  a®  +  a22  +  a  ^  9  +  a9  +  a29  +  a27  +  a7 


so  V  V  Rlfi  -  S16  =  a7 
transmitted  message. 


a2  which  is 


the  first  symbol  of  the 


The  algorithm  continues  until 


They  are: 


i=0,  1, 


/V 

M0 

/N 

“i 

A 

M.. 


/s 

M5 

✓s. 

M6 

ft. 


=  a4 


=  al  2 
=  a28 

=  a29 
=  a1 7 
=  a3 
=  a30 
„lu 


A 

”8 

A 

Ms 

Mi0 

/V 

Mil 
/\ 
Ml  2 

M 


13 


14 


,,  14  are  computed. 


a 

a 

a 

a 

a 

a 

« 


7 

1  5 
24 
1 
6 

14 

20 


L 
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